[jira] [Commented] (AVRO-3451) fix poor Avro write performance

Jack Klamer (Jira) Wed, 16 Mar 2022 18:58:04 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507940#comment-17507940
 ]


Jack Klamer commented on AVRO-3451:
-----------------------------------

Hey! Thank you for testing against it! One of the things to note that this PR 
does not address: the resolved schema (as its called now, basically Name -> 
&Schema lookup) can be reused for every encoding. However the writer does not 
take advantage of that, so it's building the resolved schema for every "encode" 
call, which at least isn't cloning but its still allocating space to store the 
schema references. So if your messages you're writing include a lot of optional 
fields of records you could get a situation where the number of hash map 
allocations (for smaller storages but still allocating) is much higher. It 
could also be the case that the cloning I'm doing to create the Name key is 
somehow more costly that the cloning that was happening to create the string 
key. Once this PR is in, it should be easy to make a new class/writer encoding 
function that can take advantage of the ResolvedSchema ++ work on optimizing it 
further.

However there very well could be something completely else causing your issue 
that we will need to address. 

> fix poor Avro write performance
> -------------------------------
>
>                 Key: AVRO-3451
>                 URL: https://issues.apache.org/jira/browse/AVRO-3451
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: rust
>    Affects Versions: 1.11.0
>         Environment: Mac OS X Big Sur
> {code:java}
> installed toolchains
> --------------------
> stable-x86_64-apple-darwin (default)
> nightly-x86_64-apple-darwin
> active toolchain
> ----------------
> stable-x86_64-apple-darwin (default)
> rustc 1.56.1 (59eed8a2a 2021-11-01) {code}
>            Reporter: Kevin
>            Priority: Major
>         Attachments: Screen Shot 2022-03-14 at 7.30.24 PM.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Rust implementation of Apache Avro library – apache-avro (née avro-rs) – 
> demonstrates poor write performance when serializing Rust structures to Avro. 
> Profiling indicates that this implementation spends an inordinate amount of 
> time in the function {{encode::encode_ref}} performing {{clone()}} and 
> {{drop}} operations related to a HashMap<String, Schema> type.
> We modified the function {{encode_ref0}} as follows:
> {code:java}
> -pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
> -    fn encode_ref0(
> +pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut 
> Vec<u8>) {
> +    fn encode_ref0<'a>(
>          value: &Value,
> -        schema: &Schema,
> +        schema: &'a Schema,
>          buffer: &mut Vec<u8>,
> -        schemas_by_name: &mut HashMap<String, Schema>,
> +        schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
>      ) {
>          match &schema {
>              Schema::Ref { ref name } => {
> -                let resolved = 
> schemas_by_name.get(name.name.as_str()).unwrap();
> +                let resolved = schemas_by_name.get(&name.name as 
> &str).unwrap();
>                  return encode_ref0(value, resolved, buffer, &mut 
> schemas_by_name.clone());
>              }
>              Schema::Record { ref name, .. }
>              | Schema::Enum { ref name, .. }
>              | Schema::Fixed { ref name, .. } => {
> -                schemas_by_name.insert(name.name.clone(), schema.clone());
> +                schemas_by_name.insert(&name.name, &schema);
>              }
>              _ => (),
>          }{code}
> to remove any need for Clone in the {{schemas_by_name}} cache and see a 
> notable improvement (factor of 4 to 5) in our application with this change.
> After this change, all Cargo Tests still pass and Benchmarks display a very 
> significant improvement in Write performance across the board. Attached below 
> is one example benchmark for {{big schema, write 10k records}} with Before on 
> the Left and After on the Right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (AVRO-3451) fix poor Avro write performance

Reply via email to