Kevin created AVRO-3451:
---------------------------

             Summary: fix poor Avro write performance
                 Key: AVRO-3451
                 URL: https://issues.apache.org/jira/browse/AVRO-3451
             Project: Apache Avro
          Issue Type: Improvement
          Components: rust
    Affects Versions: 1.11.0
         Environment: Mac OS X Big Sur
{code:java}
installed toolchains
--------------------
stable-x86_64-apple-darwin (default)
nightly-x86_64-apple-darwin

active toolchain
----------------
stable-x86_64-apple-darwin (default)
rustc 1.56.1 (59eed8a2a 2021-11-01) {code}
            Reporter: Kevin


Rust implementation of Apache Avro library – apache-avro (née avro-rs) – 
demonstrates poor write performance when serializing Rust structures to Avro. 
Profiling indicates that this implementation spends an inordinate amount of 
time in the function {{encode::encode_ref}} performing {{clone()}} and {{drop}} 
operations related to a HashMap<String, Schema> type.

We modified the function {{encode_ref0}} as follows:
{code:java}
-pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
-    fn encode_ref0(
+pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut Vec<u8>) 
{
+    fn encode_ref0<'a>(
         value: &Value,
-        schema: &Schema,
+        schema: &'a Schema,
         buffer: &mut Vec<u8>,
-        schemas_by_name: &mut HashMap<String, Schema>,
+        schemas_by_name: &mut HashMap<&'a str, &'a Schema>,
     ) {
         match &schema {
             Schema::Ref { ref name } => {
-                let resolved = 
schemas_by_name.get(name.name.as_str()).unwrap();
+                let resolved = schemas_by_name.get(&name.name as 
&str).unwrap();
                 return encode_ref0(value, resolved, buffer, &mut 
schemas_by_name.clone());
             }
             Schema::Record { ref name, .. }
             | Schema::Enum { ref name, .. }
             | Schema::Fixed { ref name, .. } => {
-                schemas_by_name.insert(name.name.clone(), schema.clone());
+                schemas_by_name.insert(&name.name, &schema);
             }
             _ => (),
         }{code}
to remove any need for Clone in the {{schemas_by_name}} cache and see a notable 
improvement (factor of 4 to 5) in our application with this change.

After this change, all Cargo Tests still pass and Benchmarks display a very 
significant improvement in Write performance across the board.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to