[ 
https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. updated AVRO-25:
------------------------------------

    Attachment: AVRO-25.patch

Integrated ValueOutput and ValueInput with the rest of Avro. ValueWriter, 
ValueReader, ByteBufferValueWriter and ByteBufferValueReader are no longer used.

Removed duplication of encoding logic. Moved the encoding methods as static 
methods to a new class Encoder.

I've not carried out one suggestion made by Doug. I've not eliminated the stack 
in BasicValueWriter. The stack there performs two functions:
- Basic validation of calls such as writeArrayEnd() is called only after a 
writeArrayStart() etc. We can give up this validation by removing the stack.
- The stack also enables the following. There are two overloaded methods each 
for writeArrayStart() and writeMapStart(), One takes the number of elements in 
the array/map and the other does not. The latter version is useful if the 
client does not know the exact number of elements in the container. The 
situation arises, for example, when streaming from a JSON array or map. The 
BlockingValueWriter can handle this situation because it buffers the contents. 
The BasicValueWriter handles this by writing single-element block every time 
startItem() is called. To achieve this, BasicValueWriter should know if the 
previous call to writeArrayStart()/writeMapStart() supplied the count or not. 
We need the stack to store that. We can remove the no-arg 
writeArrayStart()/writeMapStart() and force the client to give the count (it it 
does not have it, it can supply unit count for each entry). The trouble with 
this is that we'll lose performance with BlockingValueWriter as it will not be 
able to skip multiple entries at once. Another option is to make the 
BasicValueWriter ignore the initial count supplied with 
writeArrayStart(int)/writeMapStart(int) and make single entry blocks. This will 
slightly increase the encoded data length. The third option is to insist that 
the client always gives the count (even if it is forced to give 1) and then let 
BlockedValueOutput to optimize item counts; that is it should make larger 
blocks of entries if the entries are too small for the buffer. This is tricky, 
but we can make it work. What do you think?

> Blocking for value output (with API change)
> -------------------------------------------
>
>                 Key: AVRO-25
>                 URL: https://issues.apache.org/jira/browse/AVRO-25
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-25.patch, AVRO-25.patch
>
>
> The Avro specification has provisions for decomposing very large arrays and 
> maps into "blocks."  These provisions allow for streaming implementations 
> that would allow one to, for example, write the contents of a file out as an 
> Avro array w/out knowing in advance how many records are in the file.
> The current Java implementation of Avro does support this provision.  My 
> colleague Thiru will be attaching a patch which implements blocking.  It 
> turns out that the buffering required to do blocking is non-trivial, so it 
> seem beneficial to include a standard implementation of blocking as part of 
> the reference Avro implementation.
> This is an early version of the code.  We are still working on testing and 
> performance tuning.  But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and 
> ValueOutput, which are meant to replace ValueReader and ValueWriter.  These 
> classes have largely the same API as ValueReader/Writer, but they include a 
> few more methods to "bracket" items that appear inside of arrays and maps.  
> Shortly, we'll be posting a separate patch which implements further 
> subclasses of ValueInput/Output that do "validation" of input and output 
> against a schema (and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you 
> to kick our tires w/out causing too much disruption to your source trees.  
> Let's validate the basic idea behind these patches first, and then determine 
> the details of integrating them into the rest of Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to