[ 
https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David McIntosh updated AVRO-1332:
---------------------------------

    Attachment: AVRO-1332-2.patch

This patch is updated to accomodate the changes that went in for 1.7.5. It 
includes new versions of the datum readers & writers that pre-resolve the 
schemas to more efficiently read/write large numbers of records. It also 
includes a patch to the data file api to allow specifying a custom datum reader 
when reading a file and a basic performance testing project to compare the old 
and new versions. In order to verify that the new versions pass all unit tests 
the default implementations have been rerouted to the new version.

Some test results are below. The tests performed 1,000,000 serializations of 
various types of records followed by 1,000,000 deserializations. Time is 
measured in ms. Batch size was also altered to judge the increased overhead of 
constructing the new datum reader/writers. A batch size of 10 indicates a new 
reader/writer was created every 10 records.

Test 1: Simple record (only primitive avro types)
serializing:
|batch size|old specific|new specific|old generic|new generic|
|1|2044|6723|2574|7550|
|10|2043|2184|2574|2948|
|100|2012|1669|2543|2402|
|1000|1996|1638|2542|2309|

deserializing:
|batch size|old specific|new specific|old generic|new generic|
|1|4415|63071|6006|13011|
|10|4415|7161|5959|2902|
|100|4400|1513|5944|1857|
|1000|4384|936|5960|1778|

Test 2: complex record including arrays,enums,unions,etc
serializing:
|batch size|old specific|new specific|old generic|new generic|
|1|16162|44117|14164|45771|
|10|16162|18658|14149|21154|
|100|16131|15943|14118|18501|
|1000|15912|15663|14149|18252|

deserializing:
|batch size|old specific|new specific|old generic|new generic|
|1|28954|997080|26006|79950|
|10|28875|111463|26068|21076|
|100|28657|22620|26021|15054|
|1000|28408|13650|25850|14431|







                
> Improve C# DatumReader performance
> ----------------------------------
>
>                 Key: AVRO-1332
>                 URL: https://issues.apache.org/jira/browse/AVRO-1332
>             Project: Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David McIntosh
>            Priority: Minor
>              Labels: performance
>         Attachments: AVRO-1332-2.patch, AVRO-1332.patch
>
>
> The current implementations of the C# datum readers perform resolution of the 
> reader and writer schema on every call to Read. In my tests this was causing 
> it to perform poorly when reading a large number of records (slower than 
> parsing the same data from delimited text files). It would be more efficient 
> if the reader only needed to resolve the schemas once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to