[ 
https://issues.apache.org/jira/browse/AVRO-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069091#comment-13069091
 ] 

Douglas Creager commented on AVRO-859:
--------------------------------------

Awesome stuff.  Whenever we decided to implement the Haskell Avro library, this 
will be a good definition of the inevitable monad that we'll have to write.  :-)

I've also been working on something similar in the C library.  Hopefully we can 
have some cross-pollination of ideas here.

It started off with the “consumer” interface that I introduced in AVRO-762.  I 
think this corresponds to the Target in your description above.  In addition to 
the generic consumer interface, I wrote an implementation of that consumer 
interface that would perform schema resolution.  And then a generic function 
that would consume binary Avro data, and pass the results into a consumer.

The natural next step would've been to add a “producer” interface, which 
would've corresponded to the Source in your model.  However, the one main issue 
I had with this approach is that you'd have two competing models: one where you 
push data through a chain of consumers, and one where you pull data through a 
chain of producers.  It didn't seem like either pushing or pulling could be 
used as the “one true way”.

To get around this, I decided to go with a new “value” interface (AVRO-837), 
rather than separate consumer and producer interfaces.  In this model, an 
{{avro_value_t}} is anything that can mimic an Avro value.  It's basically a 
big collection of getter and setter methods for the content of an Avro value of 
a particular schema.  Binary decoding doesn't have its own value 
implementation, but it can use the setter methods to fill in any value 
implementation — including one that just immediately serializes the contents 
into a JSON encoding, for instance.

Schema resolution can then be implemented as two separate value 
implementations.  (I have this one coded up, but I don't have an issue open for 
it yet.  I should get on that.)  The schema resolution classes provide a “view” 
into an existing Avro value, allowing you to treat it as if it were an instance 
of a different schema.  You need two classes because the wrapped value might be 
on either the “writer schema” or “reader schema” end of the resolution process.

> Java: Data Flow Overhaul -- Composition and Symmetry
> ----------------------------------------------------
>
>                 Key: AVRO-859
>                 URL: https://issues.apache.org/jira/browse/AVRO-859
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Data flow in Avro is currently broken into two parts:  Read and Write.  These 
> share many common patterns but almost no common code.  
> Additionally, the APIs for this are DatumReader and DatumWriter, which 
> requires that implementations know how to traverse Schemas and use the 
> Resolver.
> This is a proposal to overhaul the inner workings of Avro Java between the 
> Decoder/Encoder APIs and DatumReader/DatumWriter such that there is 
> significantly more code re-use and much greater opportunity for new features 
> that can all share in general optimizations and dynamic code generation.
> The two primary concepts involved are:
> * _*Functional Composition*_
> * _*Symmetry*_
> h4. Functional Composition
> All read and write operations can be broken into functional bits and composed 
> rather than writing monolithic classes.  This allows a "DatumWriter2" to be a 
> graph of functions that pre-compute all state required from a schema rather 
> than traverse a schema for each write.
> h4. Symmetry
> Avro's data flow can be made symmetric.  Rather than thinking in terms of 
> Read and Write, think in terms of:
> * _*Source*_: Where data that is represented by an Avro schema comes from -- 
> this may be a Decoder, or an Object graph.
> * _*Target*_: Where data that represents an Avro schema is sent -- this may 
> be an Encoder or an Object graph.
> (More detail in the comments)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to