[jira] Commented: (AVRO-25) Blocking for value output (with API change)

Doug Cutting (JIRA) Thu, 28 May 2009 09:54:09 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714054#action_12714054
 ]


Doug Cutting commented on AVRO-25:
----------------------------------

> ByteBufferValueWriter and ByteBufferValueReader are no longer used.

I think these are still needed.  They implement an optimization, where buffers 
may be read from the socket and passed to the application without copying.  
Unless I am missing something, this optimization appears to be lost in your 
patch, and it is critical if we want to implement efficient HDFS data access 
over RPC.

Again, let's please not rename ValueReader and ValueWriter in this patch.  It 
will make it harder to maintain the patch as trunk changes.  It makes the patch 
much harder to evaluate, since it includes so many changes that are irrelevant 
to the added functionality.  If we want to argue about naming, we should do it 
in a separate issue, and not get distracted by that here.

Also, for similar reasons, I would prefer we continue with ValueReader and 
ValueWriter as base classes, with BlockingValueWriter overriding methods.  I 
don't particularly like the name "Basic", nor do I see how the abstraction adds 
enough power to balance the lines of code it changes and adds.

As for the stacks in BasicValueInput and BasicValueOutput, these may affect 
performance, and I would prefer not to add such overhead to our simplest, 
fastest implementation.  If we wish to add such checking, we should address it 
in a separate issue, where we can benchmark it, etc.  This issue should ideally 
change the existing writing code as little as possible, so it remains a 
performance baseline and reference implementation.

> The latter version is useful if the client does not know the exact number of 
> elements in the container.

In these cases, clients can either buffer entries and flush them in chunks, or 
they can knowingly write them as a sequence of length=1 chunks without 
buffering.  Automatically and silently generating length=1 chunks doesn't seem 
like a big favor to clients.  BlockingWriter can then ignore the 
client-supplied chunk lengths, since it's buffering.  Could that work?

> Blocking for value output (with API change)
> -------------------------------------------
>
>                 Key: AVRO-25
>                 URL: https://issues.apache.org/jira/browse/AVRO-25
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-25.patch, AVRO-25.patch
>
>
> The Avro specification has provisions for decomposing very large arrays and 
> maps into "blocks."  These provisions allow for streaming implementations 
> that would allow one to, for example, write the contents of a file out as an 
> Avro array w/out knowing in advance how many records are in the file.
> The current Java implementation of Avro does support this provision.  My 
> colleague Thiru will be attaching a patch which implements blocking.  It 
> turns out that the buffering required to do blocking is non-trivial, so it 
> seem beneficial to include a standard implementation of blocking as part of 
> the reference Avro implementation.
> This is an early version of the code.  We are still working on testing and 
> performance tuning.  But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and 
> ValueOutput, which are meant to replace ValueReader and ValueWriter.  These 
> classes have largely the same API as ValueReader/Writer, but they include a 
> few more methods to "bracket" items that appear inside of arrays and maps.  
> Shortly, we'll be posting a separate patch which implements further 
> subclasses of ValueInput/Output that do "validation" of input and output 
> against a schema (and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you 
> to kick our tires w/out causing too much disruption to your source trees.  
> Let's validate the basic idea behind these patches first, and then determine 
> the details of integrating them into the rest of Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-25) Blocking for value output (with API change)

Reply via email to