[jira] Commented: (AVRO-383) Optiminzing ResolvingDecoder for default values

Thiruvalluvan M. G. (JIRA) Wed, 27 Jan 2010 19:55:58 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805765#action_12805765
 ]


Thiruvalluvan M. G. commented on AVRO-383:
------------------------------------------

bq. Quick question: is it possible to avoid 
ResolvingParserGenerator.java::encode() having a full switch on all AVRO types? 
Or is the reason that you can't use JsonDecoder here because this is required 
for JsonDecoder to work?

It is relatively easy to convert the JsonNode into a byte array and then read 
it back using JsonDecoder. The main reason I didn't want to use JsonDecoder is 
because JsonDecoder decodes in a streaming mode, it does not read the whole 
Json objects into memory before decoding the contents. We can decode very large 
objects without consuming too much of memory, this way. The main drawback of 
this approach is that records should encode their fields in the same order as 
that of the schema. We left this limitation in the JsonDecoder because we 
didn't want to buffer things.

But, with default values, you don't expect the that the authors of the schema 
to write the default values preserving its field order. Even if they do, if the 
schema file is passed through some Json tools, the order is not guaranteed to 
be preserved.

bq. Another way to implement ResolvingParserGenerator.java::encode() might be 
to call GenericData#defaultFieldValue() then use GenericDatumWriter to encode 
this as binary. This might be a bit slower, but it would use a lot less code, 
and this is done at schema-compilation time, so shouldn't be too performance 
critical.

I guess you meant GenericDatumReader#defaultFieldValue(). I did consider using 
defaultFieldValue() function. In fact the ResolvingParserGenerator's encode() 
modeled after defaultFieldValue(). The reason I didn't do so was that I didn't 
want to create mess of dependencies. Presently, the dependency is Schema is at 
layer 0, { BinaryEncoder, BinaryDecoder} at the layer 1, the classes in 
io.parsing at layer 2,  the rest of the "advanced" encoders and decoders of io 
package at layer 3. Everything else is above layer 4. Throwing generic package 
into the mix, will look a lot complicated.

There is a proposal to implement GenericDatumReader using ResolvingDecoder. 
Using GenericDatumReader in ResolvingDecoder will then cause circular 
dependency.

I also looked at refactoring defaultFieldValue() into some common place and use 
it both at GenericDatumReader and ResolvingDecoder. But defaultFieldValue calls 
virtual methods of GenericDatumReader giving hooks for the users to customize 
its behavior.


> Optiminzing ResolvingDecoder for default values
> -----------------------------------------------
>
>                 Key: AVRO-383
>                 URL: https://issues.apache.org/jira/browse/AVRO-383
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Thiruvalluvan M. G.
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-383-test.patch, AVRO-383.patch
>
>
> When the reader's and writer's schemas are records and the reader's schema 
> has a field with default value and the writer's schema doesn't have the 
> field, the ResolvingDecoder keeps the default value in a byte array. This 
> byte array is in Json format. Moving this to Avro binary format improves 
> performance.
> Apply the test patch and try "Perf -M". Then apply the patch and run it 
> again. On my machine, the performance is three times the original.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-383) Optiminzing ResolvingDecoder for default values

Reply via email to