[
https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-29:
------------------------------------
Attachment: AVRO-29.patch
This patch has more complete implementation of the validating and resolving
reader/writers. This has been adapted to the recent changes to readers and
writers in JIRA AVRO-25.
The ResolvingValueReader supports the following rules for resolution:
- long can read int.
- double can read int, long, float
- Fixed matches only size matchs
- Enums match symbols (if the writer defines 3 for symbol "x" and reader
defines 5 for "x", then a value 3 in the stream will return 5 to the caller)
- If writer has a Union, it matches reader's non-union type if it is one of
branches of the writer and data on the stream is of that type.
- If the writer has a non-union type and reader has a union type with the
writer's type as a branch, reader sees the union with that branch.
GenericDatumReader is modified to use ValidatingValueReader.
My performance tests show that using ValidatingValueWriter is about 8% slower
than not using it. Using ResolvingValueReader degrades performance from 0% to
8% depending on the kind of "resolution" used. When there is no resolution
(i.e. reader and writer schemas are identical), it is functionally equivalent
to ValidatingValueReader, the overhead is the maximum. [These results, of
course cache the resolving table. That is the resolving table is not
constructed for every object being decoded].
> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>
> Key: AVRO-29
> URL: https://issues.apache.org/jira/browse/AVRO-29
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Thiruvalluvan M. G.
> Attachments: AVRO-29.patch, AVRO-29.patch
>
>
> This is a companion to AVRO-25, which introduced the classes ValueOutput and
> ValueInput. This patch adds two capabilities: validation of
> ValueInput/Output calls against a schema, and schema-resolution implemented
> in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will
> validate calls against a schema. For example, if the schema calls for a
> record consisting of two longs and a double, then ValidatingOutput will allow
> the call-sequence readLong, readLong, readDouble and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema,
> and automatically performs Avro's schema-resolution logic on behalf of the
> reader. For example, if the writer's schema calls for a long, and the
> readers calls for a double, then the reader can call readDouble, and
> ResolvingValueInput will automatically decode the long sent by the writer and
> convert it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader,
> which also implements Avro's resolution logic. In many use-cases, the
> programmer has their own data structures into which they want to store data
> read from an Avro stream, data structures that cannot easily be put into the
> GenericRecord/Array class hierarchy. With ResolvingValueInput, programmers
> get the benefit of this resolution logic without being forced into the
> GenericRecord/Array class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of
> the resolution logic, and that GenericDatumReader be implemented in terms of
> ResolvingValueInput. However, we haven't implemented this change pending
> feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to
> LL(1) parsing tables. This translation is straight forward, but tedious. If
> you want to understand how the code works, we recommend that you look in the
> file "parsing.html" (included in the patch), which explains the translation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.