[ 
https://issues.apache.org/jira/browse/AVRO-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855195#comment-16855195
 ] 

Brian Lachniet commented on AVRO-2396:
--------------------------------------

Wow, [~wangyingmm], you are definitely on to something here! I think you just 
found the reason we had to disable [2 tests for performance 
reasons|https://github.com/apache/avro/commit/9a05375436385b7fce75d32cfdcd359e279e0628#diff-c0ad6e8cff4781e16e0df0f95b0c6336R474]
 a few months ago.

I did some local testing and found that we seem to be spending too much time in 
the [ObjectCreator.GetType(string, 
Schema.Type)|https://github.com/apache/avro/blob/d88900ce2eebd2200a9f1a6912719854fba7a102/lang/csharp/src/apache/main/Specific/ObjectCreator.cs#L264]
 method. This is called from {{ObjectCreator.CreateInstance}}, which the 
{{SpecificDefaultReader }}as well as other portions of the code are calling per 
new record. I made a slight change in {{SpecificDefaultReader}} to cache the 
result of that GetType invocation and call {{Activator.CreateInstance}} 
directly. After the change, one of the tests that took 30s dropped to 300ms!

I'm going to go through and re-evaluate all usages of 
{{ObjectCreator.GetType(string, Schema.Type)}} and 
{{ObjectCreator.CreateInstance}}. Thank you for this report [~wangyingmm]!

> Huge performance regression on SpecificDatumReader for array reading
> --------------------------------------------------------------------
>
>                 Key: AVRO-2396
>                 URL: https://issues.apache.org/jira/browse/AVRO-2396
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: csharp
>    Affects Versions: 1.9.0
>            Reporter: wangying
>            Assignee: Brian Lachniet
>            Priority: Blocker
>
> The company where I'm working as a .NET developer is using Avro format for 
> message.
> Recently, after upgrade to 1.9.0-rc2 and 1.9.0-rc4, there is a hug regression 
> the read array object.
> Our test case reads a ETP defined object "Energistics.Datatypes.ChannelData" 
> inside which contains 5000 dataitems, with previous avro version, it only 
> took 300~ms to read the data, which with the last version it wooks 1+ min. 
> (the protocol can found from 
> [https://www.energistics.org/etp-developers-users/])
>  
> After look through the code, I find that should be caused by a change in 
> SpecificRecordAccess class
> public object CreateRecord(object reuse)
>  {
>  return reuse ?? ObjectCreator.Instance.New(typeName, Schema.Type.Record);
>  }
> Here, the reuse is null, thus ObjectCreator.Instance.New is run 5000 times 
> and each time use reflection to get specific types.
>  
> the previously version only do it in constructor:
> private class SpecificRecordAccess : RecordAccess
>  {
>  private ObjectCreator.CtorDelegate objCreator;
> public SpecificRecordAccess(RecordSchema readerSchema)
>  {
>  objCreator = GetConstructor(readerSchema.Fullname, Schema.Type.Record);
>  }
> public object CreateRecord(object reuse)
>  {
>  return reuse ?? objCreator();
>  }
> }
>  
> I'm trying to make a workaround and pass a reuse object in method public T 
> Read(T reuse, Decoder decoder), but still not working since the 
> SpecificDatumReader doesn't pass it through in below method.
> public void AddElements( object array, int elements, int index, ReadItem 
> itemReader, Decoder decoder, bool reuse, object reuseobj)
>  {
>  var list = (IList)array;
>  for (int i=0; i < elements; i++)
>  {
>  list.Add( itemReader(null, decoder ) );
>  }
>  }
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to