[ 
https://issues.apache.org/jira/browse/AVRO-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879903#action_12879903
 ] 

Doug Cutting commented on AVRO-580:
-----------------------------------

> I'm not sure it's a good idea to encourage using multiple APIs in the same 
> code.

The goal is actually to reduce the number of APIs.  Folks could use a single 
API (specific) to read & write data.  If they've generated classes for records, 
those classes will be used, otherwise GenericRecord would be used.  The only 
reason to use GenericDatumReader and GenericDatumWriter explicitly would be to 
disable this, to force everything to use the generic representation, which 
might be useful if one must, e,g., walk data genericially.

Alternately, instead of modifying SpecificDatumReader & SpecificDatumWriter, we 
could add a third kind of reader/writer that has this behaviour.

> Is this to facilitate making Pair easier to program

It may not even be required by Pair.  Pair can be implemented as a 
manually-written SpecificRecord, so that the specific reader/writer can handle 
it.   The generic writer can write instances of this (since it only requires 
IndexedRecord, which both SpecificRecord and GenericRecord implement) and the 
generic reader could similarly read instances of this, except for instance 
creation.  So, the least-lines of code way to implement this might be to use a 
subclass of GenericDatumReader when the user has requested generic data that 
special cases Pair.  But this subclass would be equivalent to changing 
SpecificDatumWriter to punt to GenericDatumWriter when it sees an unknown 
class, and that latter implementation is more generally useful.

The specific/generic distinction is confusing.  Rather than telling folks 
who're, e.g., writing a mapreduce program that they need to decide which 
representation they'll use, we can tell them that, if they generate code it 
will be used, otherwise generic representations will be used.

> is there a genuine use case when you want to use one API for the first 
> element of the pair, and a different API for the second element?

In reduce logic we'd like to process pairs identically regardless of how their 
keys and values are represented, so using different classes makes for more 
work.  We could have a common interface for the two, but that's hard when one 
would naturally be a GenericRecord and the other a manually written class.

Currently object trees must all be either specific or generic.  By extending 
SpecificRecord, one can intermix manually-written classes with specific 
classes, permitting things like a generic Pair<K,V> that's not easily handled 
by generated code, since the key/value schemas vary.  It seems to me that it'd 
be a feature to be able to intermix generic data into trees too, simplifying 
lots of things.

> java: permit generic data within specific
> -----------------------------------------
>
>                 Key: AVRO-580
>                 URL: https://issues.apache.org/jira/browse/AVRO-580
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>
>
> It should be possible to intermix specific and generic data.  For example, if 
> some fields of a record have specific classes defined, while others do not, 
> the latter should use the generic representation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to