Martin Jubelgas created AVRO-2247:
-------------------------------------

             Summary: Improve Java reading performance with a new reader
                 Key: AVRO-2247
                 URL: https://issues.apache.org/jira/browse/AVRO-2247
             Project: Avro
          Issue Type: Improvement
          Components: java
            Reporter: Martin Jubelgas
             Fix For: 1.9.0
         Attachments: Perf-Comparison.md

Complementary to AVRO-2090, I have been working on decoding of Avro objects in 
Java and am suggesting a new implementation of a DatumReader that improves read 
performance for both generic and specific records by approximately 20% (and 
even more in cases of nested objects with defaults, a case I encounter a lot in 
practical use).

Key concept is to create a detailed execution plan once at DatumReader. This 
execution plan contains all required defaulting/lookup values so they need not 
be looked up during object traversal while reading.

The reader implementation can be enabled and disabled per GenericData instance. 
The system default is set via the system variable "org.apache.avro.fastread" 
(defaults to "false").

Attached a performance comparison of the existing implementation with the 
proposed one. Will open a pull request with respective code in a bit (not 
including interoperability with the optimizations of AVRO-2090 yet). Please let 
me know your opinion of whether this is worth pursuing further.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to