[ 
https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David McIntosh updated AVRO-1332:
---------------------------------

    Attachment: AVRO-1332-3.patch

Yes, the readers/writers can be cached and reused. They should be thread-safe 
as well. I think it might be best for the users to manage that themselves if 
performance is a concern in their app.

It looks like the Complex example ended up slower because the benefit of 
pre-resolving was small and there was extra overhead when processing the 
results of the pre-resolution.  The Complex schema had a lot of unions and 
arrays of basic types which won't see much speedup. I was able to make a few 
tweaks to shrink the time gap though. I also discovered one of the unit tests 
was failing and the fix slowed down the new specific writer slightly.

Here are the new results for batch size 1000. I also included two other types. 
Narrow is a schema with 3 primitive fields. Wide has 35 fields of mostly 
primitives and a few child records.

Serializing
|type|old specific|new specific|old generic|new generic|
|simple|1950|1513|2496|1904|
|complex|14696|16380|13806|14945|
|narrow|1030|796|1217|952|
|wide|16599|14586|13167|10655|

Deserializing
|type|old specific|new specific|old generic|new generic|
|simple|4321|905|5647|1669|
|complex|28158|13541|25631|14071|
|narrow|2355|515|2854|764|
|wide|25116|5319|30295|10093|
                
> Improve C# DatumReader performance
> ----------------------------------
>
>                 Key: AVRO-1332
>                 URL: https://issues.apache.org/jira/browse/AVRO-1332
>             Project: Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David McIntosh
>            Priority: Minor
>              Labels: performance
>         Attachments: AVRO-1332-2.patch, AVRO-1332-3.patch, AVRO-1332.patch
>
>
> The current implementations of the C# datum readers perform resolution of the 
> reader and writer schema on every call to Read. In my tests this was causing 
> it to perform poorly when reading a large number of records (slower than 
> parsing the same data from delimited text files). It would be more efficient 
> if the reader only needed to resolve the schemas once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to