[ 
https://issues.apache.org/jira/browse/AVRO-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobi Vollebregt updated AVRO-2147:
----------------------------------
    Attachment: TestProtobufPerf.java

> Proto to Avro serialization is unnecessarily slow due to repeated schema 
> creation
> ---------------------------------------------------------------------------------
>
>                 Key: AVRO-2147
>                 URL: https://issues.apache.org/jira/browse/AVRO-2147
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.8.1, 1.8.2
>            Reporter: Tobi Vollebregt
>            Priority: Major
>              Labels: java, optimization, protobuf
>         Attachments: AVRO-2147.patch, TestProtobufPerf.java, test.proto
>
>
> Hi,
> I discovered that proto to avro serialization is unnecessarily slow in 
> certain cases due to repeated schema creation. Specifically, this slowness 
> shows when serializing protocol buffer messages that contain nested protocol 
> buffer messages that contain enums with many possible values. Some profiling 
> showed this is due to the {{Schema}} objects for the nested message/enum not 
> being cached in this case.
> An example that reproduces this is to add the following to {{test.proto}}:
> {{message Foo {}}
>  {{  ...}}
>  {{  optional MessageWithLargeEnum bar = 21;}}
>  {{}}}
>  {{message MessageWithLargeEnum {}}
>  {{  optional LargeEnum enum = 1;}}
>  {{}}}
>  {{enum LargeEnum {}}
>  {{  AA = 1;}}
>  {{  AB = 2;}}
>  {{  AC = 3;}}
>  {{  ...}}
>  {{  ZZ = 676;}}
>  {{}}}
> Then, a test like the following will exhibit the slow behavior:
> {{@Test public void perf() throws Exception {}}
>  {{  Foo.Builder builder = Foo.newBuilder();}}
>  {{  builder.setInt32(0);}}
>  {{  builder.setInt64(2);}}
>  {{  builder.setUint32(3);}}
>  {{  builder.setUint64(4);}}
>  {{  builder.setSint32(5);}}
>  {{  builder.setSint64(6);}}
>  {{  builder.setFixed32(7);}}
>  {{  builder.setFixed64(8);}}
>  {{  builder.setSfixed32(9);}}
>  {{  builder.setSfixed64(10);}}
>  {{  builder.setFloat(1.0F);}}
>  {{  builder.setDouble(2.0);}}
>  {{  builder.setBool(true);}}
>  {{  builder.setString("foo");}}
>  {{  builder.setBytes(ByteString.copyFromUtf8("bar"));}}
>  {{  builder.setEnum(org.apache.avro.protobuf.Test.A.X);}}
>  {{  builder.addIntArray(27);}}
>  {{  builder.addSyms(org.apache.avro.protobuf.Test.A.Y);}}
>  {{   
> builder.setBar(MessageWithLargeEnum.newBuilder().setEnum(LargeEnum.AA));}}
> {{  Foo objToConvert = builder.build();}}
> {{  Schema schema = ProtobufData.get().getSchema(Foo.class);}}
>  {{  ByteArrayOutputStream bao = new ByteArrayOutputStream();}}
>  {{  Encoder e = EncoderFactory.get().binaryEncoder(bao, null);}}
>  {{  ProtobufDatumWriter<Foo> w = new ProtobufDatumWriter<Foo>(schema);}}
>  {{  GenericDatumReader gdr = new GenericDatumReader(schema, schema);}}
>  {{  BinaryDecoder d = null;}}
> {{  long startTime = System.nanoTime();}}
>  {{  for (int i = 0; i < 1000000; ++i) {}}
>  {{    bao.reset();}}
>  {{    w.write(objToConvert, e);}}
>  {{    e.flush();}}
>  {{    d = DecoderFactory.get().binaryDecoder(bao.toByteArray(), d);}}
>  {{     gdr.read(null, d);}}
>  \{{  }}}
>  {{  long endTime = System.nanoTime();}}
>  {{  System.out.println("Elapsed: " + (endTime - startTime) / 1000000 + " 
> ms");}}
>  {{}}}
> I will attach a patch that optimizes this.
> With the attached patch this test reports a runtime of about 4 seconds, while 
> the runtime without the patch is 30+ seconds, so this is an 7.5-8x 
> improvement for this particular test enum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to