[
https://issues.apache.org/jira/browse/AVRO-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David McIntosh updated AVRO-1332:
---------------------------------
Attachment: AVRO-1332-3.patch
Yes, the readers/writers can be cached and reused. They should be thread-safe
as well. I think it might be best for the users to manage that themselves if
performance is a concern in their app.
It looks like the Complex example ended up slower because the benefit of
pre-resolving was small and there was extra overhead when processing the
results of the pre-resolution. The Complex schema had a lot of unions and
arrays of basic types which won't see much speedup. I was able to make a few
tweaks to shrink the time gap though. I also discovered one of the unit tests
was failing and the fix slowed down the new specific writer slightly.
Here are the new results for batch size 1000. I also included two other types.
Narrow is a schema with 3 primitive fields. Wide has 35 fields of mostly
primitives and a few child records.
Serializing
|type|old specific|new specific|old generic|new generic|
|simple|1950|1513|2496|1904|
|complex|14696|16380|13806|14945|
|narrow|1030|796|1217|952|
|wide|16599|14586|13167|10655|
Deserializing
|type|old specific|new specific|old generic|new generic|
|simple|4321|905|5647|1669|
|complex|28158|13541|25631|14071|
|narrow|2355|515|2854|764|
|wide|25116|5319|30295|10093|
> Improve C# DatumReader performance
> ----------------------------------
>
> Key: AVRO-1332
> URL: https://issues.apache.org/jira/browse/AVRO-1332
> Project: Avro
> Issue Type: Improvement
> Components: csharp
> Affects Versions: 1.7.5
> Reporter: David McIntosh
> Priority: Minor
> Labels: performance
> Attachments: AVRO-1332-2.patch, AVRO-1332-3.patch, AVRO-1332.patch
>
>
> The current implementations of the C# datum readers perform resolution of the
> reader and writer schema on every call to Read. In my tests this was causing
> it to perform poorly when reading a large number of records (slower than
> parsing the same data from delimited text files). It would be more efficient
> if the reader only needed to resolve the schemas once.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira