[
https://issues.apache.org/jira/browse/AVRO-1438?focusedWorklogId=744796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744796
]
ASF GitHub Bot logged work on AVRO-1438:
----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Mar/22 23:13
Start Date: 20/Mar/22 23:13
Worklog Time Spent: 10m
Work Description: zcsizmadia commented on pull request #1604:
URL: https://github.com/apache/avro/pull/1604#issuecomment-1073369284
@KyleSchoonover Could you post your before and afetr measurements?
Here is my diff I applied to `master`:
```
diff --git a/lang/csharp/src/apache/main/Generic/GenericReader.cs
b/lang/csharp/src/apache/main/Generic/GenericReader.cs
index f42e572d..45df3b29 100644
--- a/lang/csharp/src/apache/main/Generic/GenericReader.cs
+++ b/lang/csharp/src/apache/main/Generic/GenericReader.cs
@@ -121,6 +121,9 @@ namespace Avro.Generic
{
this.ReaderSchema = readerSchema;
this.WriterSchema = writerSchema;
+
+ if (!ReaderSchema.CanRead(WriterSchema))
+ throw new AvroException("Schema mismatch. Reader: " +
ReaderSchema + ", writer: " + WriterSchema);
}
/// <summary>
@@ -134,9 +137,6 @@ namespace Avro.Generic
/// <returns>Object read from the decoder.</returns>
public T Read<T>(T reuse, Decoder decoder)
{
- if (!ReaderSchema.CanRead(WriterSchema))
- throw new AvroException("Schema mismatch. Reader: " +
ReaderSchema + ", writer: " + WriterSchema);
-
return (T)Read(reuse, WriterSchema, ReaderSchema, decoder);
}
```
Before:
```
$ dotnet run -c Release -f net6.0
type impl action total_items batches batch_size time(ms)
simple default_specific serialize 1000000 1000 1000 500
simple default_specific deserialize 1000000 1000 1000 1188
simple preresolved_specific serialize 1000000 1000 1000 312
simple preresolved_specific deserialize 1000000 1000 1000 328
simple default_generic serialize 1000000 1000 1000 391
simple default_generic deserialize 1000000 1000 1000 1109
simple preresolved_generic serialize 1000000 1000 1000 250
simple preresolved_generic deserialize 1000000 1000 1000 469
complex default_specific serialize 1000000 1000 1000 3015
complex default_specific deserialize 1000000 1000 1000 7391
complex preresolved_specific serialize 1000000 1000 1000 2438
complex preresolved_specific deserialize 1000000 1000 1000 3406
complex default_generic serialize 1000000 1000 1000 2937
complex default_generic deserialize 1000000 1000 1000 5735
complex preresolved_generic serialize 1000000 1000 1000 2031
complex preresolved_generic deserialize 1000000 1000 1000 2641
narrow default_specific serialize 1000000 1000 1000 203
narrow default_specific deserialize 1000000 1000 1000 547
narrow preresolved_specific serialize 1000000 1000 1000 156
narrow preresolved_specific deserialize 1000000 1000 1000 157
narrow default_generic serialize 1000000 1000 1000 203
narrow default_generic deserialize 1000000 1000 1000 562
narrow preresolved_generic serialize 1000000 1000 1000 141
narrow preresolved_generic deserialize 1000000 1000 1000 219
wide default_specific serialize 1000000 1000 1000 2610
wide default_specific deserialize 1000000 1000 1000 6593
wide preresolved_specific serialize 1000000 1000 1000 2297
wide preresolved_specific deserialize 1000000 1000 1000 2141
wide default_generic serialize 1000000 1000 1000 2109
wide default_generic deserialize 1000000 1000 1000 6235
wide preresolved_generic serialize 1000000 1000 1000 1343
wide preresolved_generic deserialize 1000000 1000 1000 2766
```
After:
```
$ dotnet run -c Release -f net6.0
type impl action total_items batches batch_size time(ms)
simple default_specific serialize 1000000 1000 1000 531
simple default_specific deserialize 1000000 1000 1000 891
simple preresolved_specific serialize 1000000 1000 1000 328
simple preresolved_specific deserialize 1000000 1000 1000 344
simple default_generic serialize 1000000 1000 1000 453
simple default_generic deserialize 1000000 1000 1000 844
simple preresolved_generic serialize 1000000 1000 1000 281
simple preresolved_generic deserialize 1000000 1000 1000 547
complex default_specific serialize 1000000 1000 1000 3579
complex default_specific deserialize 1000000 1000 1000 7218
complex preresolved_specific serialize 1000000 1000 1000 2875
complex preresolved_specific deserialize 1000000 1000 1000 3969
complex default_generic serialize 1000000 1000 1000 3266
complex default_generic deserialize 1000000 1000 1000 4734
complex preresolved_generic serialize 1000000 1000 1000 2109
complex preresolved_generic deserialize 1000000 1000 1000 2797
narrow default_specific serialize 1000000 1000 1000 219
narrow default_specific deserialize 1000000 1000 1000 406
narrow preresolved_specific serialize 1000000 1000 1000 141
narrow preresolved_specific deserialize 1000000 1000 1000 187
narrow default_generic serialize 1000000 1000 1000 204
narrow default_generic deserialize 1000000 1000 1000 375
narrow preresolved_generic serialize 1000000 1000 1000 140
narrow preresolved_generic deserialize 1000000 1000 1000 250
wide default_specific serialize 1000000 1000 1000 2688
wide default_specific deserialize 1000000 1000 1000 4781
wide preresolved_specific serialize 1000000 1000 1000 1969
wide preresolved_specific deserialize 1000000 1000 1000 2078
wide default_generic serialize 1000000 1000 1000 2094
wide default_generic deserialize 1000000 1000 1000 4640
wide preresolved_generic serialize 1000000 1000 1000 1422
wide preresolved_generic deserialize 1000000 1000 1000 2953
```
You are definetely onto somethng with the CanRead function. E.g. `wide
default_generic deserialize 1000000 1000 1000`
improved from 6235ms ->4640ms = ~35%, which is massive,
Btw, `PreresolvingDatumReader` does the `if
(!ReaderSchema.CanRead(WriterSchema))` check in the constructor and not in the
Read function. Which seems to be the correct path.
However there are other places where CanRead is called, so caching might
make sense there as well for the other types as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 744796)
Time Spent: 1h 20m (was: 1h 10m)
> C# reader performance improvement
> ---------------------------------
>
> Key: AVRO-1438
> URL: https://issues.apache.org/jira/browse/AVRO-1438
> Project: Apache Avro
> Issue Type: Improvement
> Components: csharp
> Affects Versions: 1.7.5
> Reporter: David Taylor
> Priority: Minor
> Labels: pull-request-available
> Attachments: RecordSchema.cs.diff
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> GenericReader/SpecificReader spend a lot of time comparing the reader/writer
> schema. Remembering the last good match speeds things up about 15% in my
> tests using the avro.pref project for timings. This does not impact the
> DatumReader implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)