[
https://issues.apache.org/jira/browse/AVRO-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Tzvetanov Grigorov resolved AVRO-3451.
---------------------------------------------
Fix Version/s: 1.11.1
1.12.0
Assignee: Jack Klamer
Resolution: Fixed
> fix poor Avro write performance
> -------------------------------
>
> Key: AVRO-3451
> URL: https://issues.apache.org/jira/browse/AVRO-3451
> Project: Apache Avro
> Issue Type: Improvement
> Components: rust
> Affects Versions: 1.11.0
> Environment: Mac OS X Big Sur
> {code:java}
> installed toolchains
> --------------------
> stable-x86_64-apple-darwin (default)
> nightly-x86_64-apple-darwin
> active toolchain
> ----------------
> stable-x86_64-apple-darwin (default)
> rustc 1.56.1 (59eed8a2a 2021-11-01) {code}
> Reporter: Kevin
> Assignee: Jack Klamer
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.1, 1.12.0
>
> Attachments: Screen Shot 2022-03-14 at 7.30.24 PM.png
>
> Original Estimate: 1h
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Rust implementation of Apache Avro library – apache-avro (née avro-rs) –
> demonstrates poor write performance when serializing Rust structures to Avro.
> Profiling indicates that this implementation spends an inordinate amount of
> time in the function {{encode::encode_ref}} performing {{clone()}} and
> {{drop}} operations related to a HashMap<String, Schema> type.
> We modified the function {{encode_ref0}} as follows:
> {code:java}
> -pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
> - fn encode_ref0(
> +pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut
> Vec<u8>) {
> + fn encode_ref0<'a>(
> value: &Value,
> - schema: &Schema,
> + schema: &'a Schema,
> buffer: &mut Vec<u8>,
> - schemas_by_name: &mut HashMap<String, Schema>,
> + schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
> ) {
> match &schema {
> Schema::Ref { ref name } => {
> - let resolved =
> schemas_by_name.get(name.name.as_str()).unwrap();
> + let resolved = schemas_by_name.get(&name.name as
> &str).unwrap();
> return encode_ref0(value, resolved, buffer, &mut
> schemas_by_name.clone());
> }
> Schema::Record { ref name, .. }
> | Schema::Enum { ref name, .. }
> | Schema::Fixed { ref name, .. } => {
> - schemas_by_name.insert(name.name.clone(), schema.clone());
> + schemas_by_name.insert(&name.name, &schema);
> }
> _ => (),
> }{code}
> to remove any need for Clone in the {{schemas_by_name}} cache and see a
> notable improvement (factor of 4 to 5) in our application with this change.
> After this change, all Cargo Tests still pass and Benchmarks display a very
> significant improvement in Write performance across the board. Attached below
> is one example benchmark for {{big schema, write 10k records}} with Before on
> the Left and After on the Right.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)