[jira] [Commented] (AVRO-859) Java: Data Flow Overhaul -- Composition and Symmetry

Douglas Creager (JIRA) Thu, 21 Jul 2011 10:49:22 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069101#comment-13069101
 ]


Douglas Creager commented on AVRO-859:
--------------------------------------

One possible issue with the “BinaryDecoder source + JsonEncoder target” example 
that you give.  I think that this will only work easily when you're not doing 
any schema resolution.  This gets back to the push-vs-pull thing I mention in 
my previous comment.  In this example, you can either have the BinaryDecoder 
control things, and send data into the JsonEncoder as its decoded.  Or you can 
have the JsonEncoder control things, and pull data from the BinaryDecoder as 
its needed.  If there's no schema resolution, this works great.  Either (a) the 
BinaryDecoder reads a value, and because there's no resolution, that's exactly 
the value that the JsonEncoder will need next; or (b) the JsonEncoder asks for 
a value, and because there's no schema resolution, that's exactly the value 
that the BinaryDecoder will expect to read next from the stream.

If you're doing schema resolution, though, the decoder and encoder will be 
working with different schemas.  And the fields of a record type might be in a 
different order.  If the decoder is pushing data into the encoder, the encoder 
will have to buffer things if it receives a field that isn't the next one that 
it needs to serialize.  And vice versa — if the encoder is pulling data, the 
decoder might have to deserialize and buffer a bunch of intermediary fields 
until it gets to the one that was requested by the encoder.

None of this is a deal-breaker, but it highlights that you really want to 
support both pushing and pulling; ideally in this situation, you'd have the 
decoder push the data into an in-memory representation (doing the schema 
resolution there to be able to skip over any fields that will be dropped).  
That in-memory representation would be the buffering that you use to get around 
field reordering.  And then as a separate process, you have the encoder pull 
data from the in-memory object.  That way each operation gets to be written 
either as push or pull, whichever is most natural, and without any extra 
complication.

> Java: Data Flow Overhaul -- Composition and Symmetry
> ----------------------------------------------------
>
>                 Key: AVRO-859
>                 URL: https://issues.apache.org/jira/browse/AVRO-859
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Data flow in Avro is currently broken into two parts:  Read and Write.  These 
> share many common patterns but almost no common code.  
> Additionally, the APIs for this are DatumReader and DatumWriter, which 
> requires that implementations know how to traverse Schemas and use the 
> Resolver.
> This is a proposal to overhaul the inner workings of Avro Java between the 
> Decoder/Encoder APIs and DatumReader/DatumWriter such that there is 
> significantly more code re-use and much greater opportunity for new features 
> that can all share in general optimizations and dynamic code generation.
> The two primary concepts involved are:
> * _*Functional Composition*_
> * _*Symmetry*_
> h4. Functional Composition
> All read and write operations can be broken into functional bits and composed 
> rather than writing monolithic classes.  This allows a "DatumWriter2" to be a 
> graph of functions that pre-compute all state required from a schema rather 
> than traverse a schema for each write.
> h4. Symmetry
> Avro's data flow can be made symmetric.  Rather than thinking in terms of 
> Read and Write, think in terms of:
> * _*Source*_: Where data that is represented by an Avro schema comes from -- 
> this may be a Decoder, or an Object graph.
> * _*Target*_: Where data that represents an Avro schema is sent -- this may 
> be an Encoder or an Object graph.
> (More detail in the comments)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-859) Java: Data Flow Overhaul -- Composition and Symmetry

Reply via email to