[
https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-25:
------------------------------------
Attachment: AVRO-25.patch
This patch addresses most of the issues raised by Doug.
- The names of the classes are Writer and Reader with the new API.
- I've removed the stacks that were used to validation of reading and writing.
- Changed the API so that the client must give the number of map/array entries
before writing the entries themselves. This change was required to get rid of
the stack in the writer.
I ran some performance tests to compare the new Reader/Writer API with the old
one. After removing the stack, the BasicValueWriter is marginally (2 to 4%)
faster the old one. The BlockingValueWriter is about 10% faster (see the note
below)
The read performance with data written by BasicValueWriter are more or less
same as the original. There is no significant difference in performance for
both full read and projection read (where some fields are absent in reader's
record schema). With the data written by BlockingValueWriter the full read
performance is similar to the old code. With projection read the performance is
comparable if the field being ignored is not an array or map. If an array or
map field is ignored due to projection, as expected, the performance is about
three to four times better compared to the original.
Note: I've not implemented one change suggested by Doug. There are still two
versions each of encodeLong(), encodeDouble() and encodeFloat(). One version
encodes into a stream and the other into a buffer. When I tried to have a
single version (that writes to a stream) and use ByteArrayOutputStream and
arrayCopy as a substitue for buffer version, the performance of
BlockingValueWriter fell by about 40%. With the buffer version, the performance
went up by 10%. This is perhaps because, for example, encoding each long will
require about 5 calls to OutputStream.write(int). The buffer version does no
function calls.
> Blocking for value output (with API change)
> -------------------------------------------
>
> Key: AVRO-25
> URL: https://issues.apache.org/jira/browse/AVRO-25
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Thiruvalluvan M. G.
> Attachments: AVRO-25.patch, AVRO-25.patch, AVRO-25.patch
>
>
> The Avro specification has provisions for decomposing very large arrays and
> maps into "blocks." These provisions allow for streaming implementations
> that would allow one to, for example, write the contents of a file out as an
> Avro array w/out knowing in advance how many records are in the file.
> The current Java implementation of Avro does support this provision. My
> colleague Thiru will be attaching a patch which implements blocking. It
> turns out that the buffering required to do blocking is non-trivial, so it
> seem beneficial to include a standard implementation of blocking as part of
> the reference Avro implementation.
> This is an early version of the code. We are still working on testing and
> performance tuning. But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and
> ValueOutput, which are meant to replace ValueReader and ValueWriter. These
> classes have largely the same API as ValueReader/Writer, but they include a
> few more methods to "bracket" items that appear inside of arrays and maps.
> Shortly, we'll be posting a separate patch which implements further
> subclasses of ValueInput/Output that do "validation" of input and output
> against a schema (and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you
> to kick our tires w/out causing too much disruption to your source trees.
> Let's validate the basic idea behind these patches first, and then determine
> the details of integrating them into the rest of Avro.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.