[jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput

Doug Cutting (JIRA) Wed, 17 Jun 2009 14:52:32 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720921#action_12720921
 ]


Doug Cutting commented on AVRO-29:
----------------------------------

Some comments and questions:
- What's the advantage of using ResovlingRecordReader over the existing code in 
GenericDatumReader?  This replaces existing logic with a lot more code, code 
that's harder to understand, IMHO.  It's interesting that it can be done this 
way, but what value does this give applications?  Above Raymie states, "With 
ResolvingValueInput, programmers get the benefit of this resolution logic 
without being forced into the GenericRecord/Array class hierarchy." But 
applications already are not forced into this.  They simply must override 
addField(), getField() and removeField(), as with ResolvingRecordReader.  For 
example, SpecificDatumReader extends GenericDatumReader to read records that do 
not implement GenericRecord.  What am I missing?
 - Similarly, what's the advantage of the ValidatingValueReader/Writer?  It 
feels a bit like belt-and-suspenders.  The existing DatumReader and DatumWriter 
implementations do not perform invalid operations.  This is tested by unit 
tests, so why check it again at runtime for every record, even if it can be 
done relatively cheaply?  We don't expect there to be thousands of DatumReader 
and DatumWriter implementations.  All existing ones extend 
GenericDatumReader/Writer, which should prevent invalid operations, and I have 
a hard time imagining an application which cannot be reasonably implemented by 
extending these.  Again, what am I missing?
- GenericDatumReader: ValidatingValueReader import is not used;
- GenericDatumReader: It's unfortunate that this includes two versions of 
readRecord.
- Might the resolving/parsing classes would better live in an io.validate 
package?
- What should we do with the parsing.html file?  If it's for developers only, 
then it could live with the source code (perhaps alongside the .java files), in 
javadoc or on the wiki.  If we think end-users might be interested, then it 
should go into the forrest-generated docs.

> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-29.patch, AVRO-29.patch
>
>
> This is a companion to AVRO-25, which introduced the classes ValueOutput and 
> ValueInput.  This patch adds two capabilities: validation of 
> ValueInput/Output calls against a schema, and schema-resolution implemented 
> in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will 
> validate calls against a schema.  For example, if the schema calls for a 
> record consisting of two longs and a double, then ValidatingOutput will allow 
> the call-sequence readLong, readLong, readDouble and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, 
> and automatically performs Avro's schema-resolution logic on behalf of the 
> reader.  For example, if the writer's schema calls for a long, and the 
> readers calls for a double, then the reader can call readDouble, and 
> ResolvingValueInput will automatically decode the long sent by the writer and 
> convert it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, 
> which also implements Avro's resolution logic.  In many use-cases, the 
> programmer has their own data structures into which they want to store data 
> read from an Avro stream, data structures that cannot easily be put into the 
> GenericRecord/Array class hierarchy.  With ResolvingValueInput, programmers 
> get the benefit of this resolution logic without being forced into the 
> GenericRecord/Array class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of 
> the resolution logic, and that GenericDatumReader be implemented in terms of 
> ResolvingValueInput.  However, we haven't implemented this change pending 
> feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to 
> LL(1) parsing tables.  This translation is straight forward, but tedious.  If 
> you want to understand how the code works, we recommend that you look in the 
> file "parsing.html" (included in the patch), which explains the translation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput

Reply via email to