[jira] [Updated] (AVRO-3451) fix poor Avro write performance

Kevin (Jira) Mon, 14 Mar 2022 19:33:08 -0700


     [ 
https://issues.apache.org/jira/browse/AVRO-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kevin updated AVRO-3451:
------------------------
    Description: 
Rust implementation of Apache Avro library – apache-avro (née avro-rs) – 
demonstrates poor write performance when serializing Rust structures to Avro. 
Profiling indicates that this implementation spends an inordinate amount of 
time in the function {{encode::encode_ref}} performing {{clone()}} and {{drop}} 
operations related to a HashMap<String, Schema> type.

We modified the function {{encode_ref0}} as follows:
{code:java}
-pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
-    fn encode_ref0(
+pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut Vec<u8>) 
{
+    fn encode_ref0<'a>(
         value: &Value,
-        schema: &Schema,
+        schema: &'a Schema,
         buffer: &mut Vec<u8>,
-        schemas_by_name: &mut HashMap<String, Schema>,
+        schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
     ) {
         match &schema {
             Schema::Ref { ref name } => {
-                let resolved = 
schemas_by_name.get(name.name.as_str()).unwrap();
+                let resolved = schemas_by_name.get(&name.name as 
&str).unwrap();
                 return encode_ref0(value, resolved, buffer, &mut 
schemas_by_name.clone());
             }
             Schema::Record { ref name, .. }
             | Schema::Enum { ref name, .. }
             | Schema::Fixed { ref name, .. } => {
-                schemas_by_name.insert(name.name.clone(), schema.clone());
+                schemas_by_name.insert(&name.name, &schema);
             }
             _ => (),
         }{code}
to remove any need for Clone in the {{schemas_by_name}} cache and see a notable 
improvement (factor of 4 to 5) in our application with this change.

After this change, all Cargo Tests still pass and Benchmarks display a very 
significant improvement in Write performance across the board. Attached below 
is one example benchmark for {{big schema, write 10k records}} with Before on 
the Left and After on the Right.

  was:
Rust implementation of Apache Avro library – apache-avro (née avro-rs) – 
demonstrates poor write performance when serializing Rust structures to Avro. 
Profiling indicates that this implementation spends an inordinate amount of 
time in the function {{encode::encode_ref}} performing {{clone()}} and {{drop}} 
operations related to a HashMap<String, Schema> type.

We modified the function {{encode_ref0}} as follows:
{code:java}
-pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
-    fn encode_ref0(
+pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut Vec<u8>) 
{
+    fn encode_ref0<'a>(
         value: &Value,
-        schema: &Schema,
+        schema: &'a Schema,
         buffer: &mut Vec<u8>,
-        schemas_by_name: &mut HashMap<String, Schema>,
+        schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
     ) {
         match &schema {
             Schema::Ref { ref name } => {
-                let resolved = 
schemas_by_name.get(name.name.as_str()).unwrap();
+                let resolved = schemas_by_name.get(&name.name as 
&str).unwrap();
                 return encode_ref0(value, resolved, buffer, &mut 
schemas_by_name.clone());
             }
             Schema::Record { ref name, .. }
             | Schema::Enum { ref name, .. }
             | Schema::Fixed { ref name, .. } => {
-                schemas_by_name.insert(name.name.clone(), schema.clone());
+                schemas_by_name.insert(&name.name, &schema);
             }
             _ => (),
         }{code}
to remove any need for Clone in the {{schemas_by_name}} cache and see a notable 
improvement (factor of 4 to 5) in our application with this change.

After this change, all Cargo Tests still pass and Benchmarks display a very 
significant improvement in Write performance across the board. Attached below 
is one example benchmark for {{big schema, write 10k records}}


> fix poor Avro write performance
> -------------------------------
>
>                 Key: AVRO-3451
>                 URL: https://issues.apache.org/jira/browse/AVRO-3451
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: rust
>    Affects Versions: 1.11.0
>         Environment: Mac OS X Big Sur
> {code:java}
> installed toolchains
> --------------------
> stable-x86_64-apple-darwin (default)
> nightly-x86_64-apple-darwin
> active toolchain
> ----------------
> stable-x86_64-apple-darwin (default)
> rustc 1.56.1 (59eed8a2a 2021-11-01) {code}
>            Reporter: Kevin
>            Priority: Major
>         Attachments: Screen Shot 2022-03-14 at 7.30.24 PM.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Rust implementation of Apache Avro library – apache-avro (née avro-rs) – 
> demonstrates poor write performance when serializing Rust structures to Avro. 
> Profiling indicates that this implementation spends an inordinate amount of 
> time in the function {{encode::encode_ref}} performing {{clone()}} and 
> {{drop}} operations related to a HashMap<String, Schema> type.
> We modified the function {{encode_ref0}} as follows:
> {code:java}
> -pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
> -    fn encode_ref0(
> +pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut 
> Vec<u8>) {
> +    fn encode_ref0<'a>(
>          value: &Value,
> -        schema: &Schema,
> +        schema: &'a Schema,
>          buffer: &mut Vec<u8>,
> -        schemas_by_name: &mut HashMap<String, Schema>,
> +        schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
>      ) {
>          match &schema {
>              Schema::Ref { ref name } => {
> -                let resolved = 
> schemas_by_name.get(name.name.as_str()).unwrap();
> +                let resolved = schemas_by_name.get(&name.name as 
> &str).unwrap();
>                  return encode_ref0(value, resolved, buffer, &mut 
> schemas_by_name.clone());
>              }
>              Schema::Record { ref name, .. }
>              | Schema::Enum { ref name, .. }
>              | Schema::Fixed { ref name, .. } => {
> -                schemas_by_name.insert(name.name.clone(), schema.clone());
> +                schemas_by_name.insert(&name.name, &schema);
>              }
>              _ => (),
>          }{code}
> to remove any need for Clone in the {{schemas_by_name}} cache and see a 
> notable improvement (factor of 4 to 5) in our application with this change.
> After this change, all Cargo Tests still pass and Benchmarks display a very 
> significant improvement in Write performance across the board. Attached below 
> is one example benchmark for {{big schema, write 10k records}} with Before on 
> the Left and After on the Right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (AVRO-3451) fix poor Avro write performance

Reply via email to