[ 
https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. updated AVRO-25:
------------------------------------

    Status: Patch Available  (was: Open)

Here is patch. This has a few new classes and their tests. The existing code is 
not touched.

Please start with the abstract classes ValueOutput and ValueInput and then look 
at their concrete implementations BasicValueOutput, BlockingValueOutput and 
BasicValueInput. The BasicValueOutput encodes data identical to the existing 
implementation. BlockingValueOutput encodes additional information to allow 
readers to skip large arrays and maps faster. The BasicValueInput can read both 
non-blocking and blocking version of the binary stream.

There is a single change to the binary format for blocking support. The 
existing format encodes the number of items for arrays/maps before items 
themselves are encoded. In the new format we continue to support this method. 
So binary streams created by writers with no blocking support will still be 
read by readers with blocking support. To add blocking support, we encode the 
item count as a negative number instead of usual positive number. If the item 
count is negative, then it is followed by the number of bytes occupied by the 
elements themselves. So the reader needs to simply skip the these number of 
bytes to skip the elements. The reader need not decode the individual entries. 
If the reader is not sophisticated enough, it can simply read the byte count 
and ignore it. Of course, it should negate the item count to make it positive. 
To indicate the end of array/map, the existing format encodes zero item count. 
There is no change to that in the new format.

Thus readers with blocking support can read both the binary streams with 
blocking support. Also, the readers without blocking support can still read the 
binary stream with blocking support with a small tweak to interpret the 
negative item count.

> Blocking for value output (with API change)
> -------------------------------------------
>
>                 Key: AVRO-25
>                 URL: https://issues.apache.org/jira/browse/AVRO-25
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>
> The Avro specification has provisions for decomposing very large arrays and 
> maps into "blocks."  These provisions allow for streaming implementations 
> that would allow one to, for example, write the contents of a file out as an 
> Avro array w/out knowing in advance how many records are in the file.
> The current Java implementation of Avro does support this provision.  My 
> colleague Thiru will be attaching a patch which implements blocking.  It 
> turns out that the buffering required to do blocking is non-trivial, so it 
> seem beneficial to include a standard implementation of blocking as part of 
> the reference Avro implementation.
> This is an early version of the code.  We are still working on testing and 
> performance tuning.  But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and 
> ValueOutput, which are meant to replace ValueReader and ValueWriter.  These 
> classes have largely the same API as ValueReader/Writer, but they include a 
> few more methods to "bracket" items that appear inside of arrays and maps.  
> Shortly, we'll be posting a separate patch which implements further 
> subclasses of ValueInput/Output that do "validation" of input and output 
> against a schema (and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you 
> to kick our tires w/out causing too much disruption to your source trees.  
> Let's validate the basic idea behind these patches first, and then determine 
> the details of integrating them into the rest of Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to