[
https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-25:
------------------------------------
Status: Patch Available (was: Open)
Here is patch. This has a few new classes and their tests. The existing code is
not touched.
Please start with the abstract classes ValueOutput and ValueInput and then look
at their concrete implementations BasicValueOutput, BlockingValueOutput and
BasicValueInput. The BasicValueOutput encodes data identical to the existing
implementation. BlockingValueOutput encodes additional information to allow
readers to skip large arrays and maps faster. The BasicValueInput can read both
non-blocking and blocking version of the binary stream.
There is a single change to the binary format for blocking support. The
existing format encodes the number of items for arrays/maps before items
themselves are encoded. In the new format we continue to support this method.
So binary streams created by writers with no blocking support will still be
read by readers with blocking support. To add blocking support, we encode the
item count as a negative number instead of usual positive number. If the item
count is negative, then it is followed by the number of bytes occupied by the
elements themselves. So the reader needs to simply skip the these number of
bytes to skip the elements. The reader need not decode the individual entries.
If the reader is not sophisticated enough, it can simply read the byte count
and ignore it. Of course, it should negate the item count to make it positive.
To indicate the end of array/map, the existing format encodes zero item count.
There is no change to that in the new format.
Thus readers with blocking support can read both the binary streams with
blocking support. Also, the readers without blocking support can still read the
binary stream with blocking support with a small tweak to interpret the
negative item count.
> Blocking for value output (with API change)
> -------------------------------------------
>
> Key: AVRO-25
> URL: https://issues.apache.org/jira/browse/AVRO-25
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Thiruvalluvan M. G.
>
> The Avro specification has provisions for decomposing very large arrays and
> maps into "blocks." These provisions allow for streaming implementations
> that would allow one to, for example, write the contents of a file out as an
> Avro array w/out knowing in advance how many records are in the file.
> The current Java implementation of Avro does support this provision. My
> colleague Thiru will be attaching a patch which implements blocking. It
> turns out that the buffering required to do blocking is non-trivial, so it
> seem beneficial to include a standard implementation of blocking as part of
> the reference Avro implementation.
> This is an early version of the code. We are still working on testing and
> performance tuning. But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and
> ValueOutput, which are meant to replace ValueReader and ValueWriter. These
> classes have largely the same API as ValueReader/Writer, but they include a
> few more methods to "bracket" items that appear inside of arrays and maps.
> Shortly, we'll be posting a separate patch which implements further
> subclasses of ValueInput/Output that do "validation" of input and output
> against a schema (and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you
> to kick our tires w/out causing too much disruption to your source trees.
> Let's validate the basic idea behind these patches first, and then determine
> the details of integrating them into the rest of Avro.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.