[ 
https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. updated AVRO-29:
------------------------------------

    Attachment: AVRO-29.patch

This patch has more complete implementation of the validating and resolving 
reader/writers. This has been adapted to the recent changes to readers and 
writers in JIRA AVRO-25.

The ResolvingValueReader supports the following rules for resolution:

  - long can read int.
  - double can read int, long, float
  - Fixed matches only size matchs
  - Enums match symbols (if the writer defines 3 for symbol "x" and reader 
defines 5 for "x", then a value 3 in the stream will return 5 to the caller)
  - If writer has a Union, it matches reader's non-union type if it is one of 
branches of the writer and data on the stream is of that type.
  - If the writer has a non-union type and reader has a union type with the 
writer's type as a branch, reader sees the union with that branch.

GenericDatumReader is modified to use ValidatingValueReader.

My performance tests show that using ValidatingValueWriter is about 8% slower 
than not using it. Using ResolvingValueReader degrades performance from 0% to 
8% depending on the kind of "resolution" used. When there is no resolution 
(i.e. reader and writer schemas are identical), it is functionally equivalent 
to ValidatingValueReader, the overhead is the maximum. [These results, of 
course cache the resolving table. That is the resolving table is not 
constructed for every object being decoded].


> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-29.patch, AVRO-29.patch
>
>
> This is a companion to AVRO-25, which introduced the classes ValueOutput and 
> ValueInput.  This patch adds two capabilities: validation of 
> ValueInput/Output calls against a schema, and schema-resolution implemented 
> in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will 
> validate calls against a schema.  For example, if the schema calls for a 
> record consisting of two longs and a double, then ValidatingOutput will allow 
> the call-sequence readLong, readLong, readDouble and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, 
> and automatically performs Avro's schema-resolution logic on behalf of the 
> reader.  For example, if the writer's schema calls for a long, and the 
> readers calls for a double, then the reader can call readDouble, and 
> ResolvingValueInput will automatically decode the long sent by the writer and 
> convert it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, 
> which also implements Avro's resolution logic.  In many use-cases, the 
> programmer has their own data structures into which they want to store data 
> read from an Avro stream, data structures that cannot easily be put into the 
> GenericRecord/Array class hierarchy.  With ResolvingValueInput, programmers 
> get the benefit of this resolution logic without being forced into the 
> GenericRecord/Array class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of 
> the resolution logic, and that GenericDatumReader be implemented in terms of 
> ResolvingValueInput.  However, we haven't implemented this change pending 
> feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to 
> LL(1) parsing tables.  This translation is straight forward, but tedious.  If 
> you want to understand how the code works, we recommend that you look in the 
> file "parsing.html" (included in the patch), which explains the translation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to