[ 
https://issues.apache.org/jira/browse/AVRO-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874867#action_12874867
 ] 

Kevin Oliver commented on AVRO-557:
-----------------------------------

Nice. Big improvement.

{code}
  private static class GenericReaderOneTimeUsageDirectDecoderNoResolverTest 
extends GenericReaderOneTimeUsageTest {
    private final DecoderFactory factory;
    
    protected GenericReaderOneTimeUsageDirectDecoderNoResolverTest() throws 
IOException {
      super("GenericReaderOneTimeUsageDirectDecoderNoResolverTest");
      factory = new DecoderFactory().configureDecoderBufferSize(256);
    }
    @Override protected DatumReader<Object> getReader() {
      GenericDatumReaderWithOptionalResolver<Object> reader = new 
GenericDatumReaderWithOptionalResolver<Object>(writerSchema);
      reader.setUseResolvingDecoder(false);
      return reader;
    }
    @Override protected Decoder getDecoder() {
      return factory.createBinaryDecoder(data, null);
    }
  }
{code}

GenericReaderOneTimeUsage12Test: 1899 ms, 2.1932087119840316 million 
entries/sec.  0.010264627357179553 million bytes/sec
GenericReaderOneTimeUsage13Test: 13182 ms, 0.3160544030781795 million 
entries/sec.  0.0014791937741568462 million bytes/sec
GenericReaderOneTimeUsageDirectDecoderNoResolverTest: 902 ms, 4.61638520363505 
million entries/sec.  0.02160554697489103 million bytes/sec


But... Given what you said, shouldn't 
GenericReaderOneTimeUsageDirectDecoderNoResolverTest have the same performance 
as GenericReaderOneTimeUsage12Test? Mind you, I'm happy with this improvement, 
just trying to see what the difference is.

Ok, so it looks like one half of this issue is resolved -- thanks Scott. On to 
GenericDatumReader. I'd prefer to not do caching -- as you say, the hashes on 
schemas are expensive. (I did separately push for schema's to be immutable so 
that their hashcodes could be memoized, but that hasn't been done yet). 

I dug into this change a bit, and looks like AVRO-388 added ResolvingDecoders 
to GenericDatumReader, and if you look, in version 1 of the patch, 
https://issues.apache.org/jira/secure/attachment/12431862/AVRO-388.patch the 
resolver was optional (only used when the actual and expected schemas 
differed). 

> Speed up one-time data decoding
> -------------------------------
>
>                 Key: AVRO-557
>                 URL: https://issues.apache.org/jira/browse/AVRO-557
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.3.2
>            Reporter: Kevin Oliver
>            Assignee: Kevin Oliver
>             Fix For: 1.4.0
>
>         Attachments: AVRO-557.patch
>
>
> There are big gains to be had in performance when using a BinaryDecoder and a 
> GenericDatumReader just one time. This is due to the relatively expensive 
> parsing and initialization that came with 1.3. Patch with example code and a 
> Perf harness to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to