[ https://issues.apache.org/jira/browse/AVRO-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805765#action_12805765 ]
Thiruvalluvan M. G. commented on AVRO-383: ------------------------------------------ bq. Quick question: is it possible to avoid ResolvingParserGenerator.java::encode() having a full switch on all AVRO types? Or is the reason that you can't use JsonDecoder here because this is required for JsonDecoder to work? It is relatively easy to convert the JsonNode into a byte array and then read it back using JsonDecoder. The main reason I didn't want to use JsonDecoder is because JsonDecoder decodes in a streaming mode, it does not read the whole Json objects into memory before decoding the contents. We can decode very large objects without consuming too much of memory, this way. The main drawback of this approach is that records should encode their fields in the same order as that of the schema. We left this limitation in the JsonDecoder because we didn't want to buffer things. But, with default values, you don't expect the that the authors of the schema to write the default values preserving its field order. Even if they do, if the schema file is passed through some Json tools, the order is not guaranteed to be preserved. bq. Another way to implement ResolvingParserGenerator.java::encode() might be to call GenericData#defaultFieldValue() then use GenericDatumWriter to encode this as binary. This might be a bit slower, but it would use a lot less code, and this is done at schema-compilation time, so shouldn't be too performance critical. I guess you meant GenericDatumReader#defaultFieldValue(). I did consider using defaultFieldValue() function. In fact the ResolvingParserGenerator's encode() modeled after defaultFieldValue(). The reason I didn't do so was that I didn't want to create mess of dependencies. Presently, the dependency is Schema is at layer 0, { BinaryEncoder, BinaryDecoder} at the layer 1, the classes in io.parsing at layer 2, the rest of the "advanced" encoders and decoders of io package at layer 3. Everything else is above layer 4. Throwing generic package into the mix, will look a lot complicated. There is a proposal to implement GenericDatumReader using ResolvingDecoder. Using GenericDatumReader in ResolvingDecoder will then cause circular dependency. I also looked at refactoring defaultFieldValue() into some common place and use it both at GenericDatumReader and ResolvingDecoder. But defaultFieldValue calls virtual methods of GenericDatumReader giving hooks for the users to customize its behavior. > Optiminzing ResolvingDecoder for default values > ----------------------------------------------- > > Key: AVRO-383 > URL: https://issues.apache.org/jira/browse/AVRO-383 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Thiruvalluvan M. G. > Assignee: Thiruvalluvan M. G. > Attachments: AVRO-383-test.patch, AVRO-383.patch > > > When the reader's and writer's schemas are records and the reader's schema > has a field with default value and the writer's schema doesn't have the > field, the ResolvingDecoder keeps the default value in a byte array. This > byte array is in Json format. Moving this to Avro binary format improves > performance. > Apply the test patch and try "Perf -M". Then apply the patch and run it > again. On my machine, the performance is three times the original. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.