[ 
https://issues.apache.org/jira/browse/AVRO-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831253#action_12831253
 ] 

Todd Lipcon commented on AVRO-406:
----------------------------------

bq. if you say only the first enclosing array is 'streaming' that means the 
sub-array is NOT streamed, right?

Correct. To be really technical, on a wire level we *could* stream any 
structure that is "tail streamable"... by which I mean array<foo>, or 
array<array<foo>>, or array<array<MyRecord>> where MyRecord's last field is 
"tail streamable". However, it will be impossible to enforce that clients or 
servers consume/provide the values in the correct order. For example:

{code}
void doStuff(Iterable<Iterable<Foo>> inputFoos) {
  for (Iterable<Foo> fooIter : inputFoos) {
    for (Foo foos : fooIter) {
      // do something with foo
    }
  }
}
{code}

could work, since the user is consuming the input in the same order it's being 
serialized on the wire. However, if the outer iterator were moved before all of 
the inner iterator's data was consumed, it would no longer work (the second 
array<Foo> isn't available until the first array<Foo> is done). Granted, we 
could "skip ahead" at this point, but I think this complexity would be very 
bad, and probably not clear for framework users either.

For your use case, could you get by with a bit more application-level logic and 
change your array<array<Cell>> to something more like:

{code}
record ResponseChunk {
  boolean continuingPreviousRow;
  array<Cell> cells;
} 
array<ResponseChunk> getCells(...)
{code}

where you'd send a few cells at a time in a ResponseChunk, and unwrap them on 
the other side into whatever user-level API you want?

bq. If so, then streaming excessively large objects in the process of streaming 
normal and other associated objects might not be the right thing to do.

Sorry, I couldn't parse this sentence. Can you explain further what you mean? I 
guess you're referring to streaming large binary values? If so, I think it will 
be impossible to do it in a general way from the API even if the wire protocol 
supports it. The large binary values can always be "chunked" as above and it 
shouldn't be a big hassle for developers, right?

(should be noted this is probably an "advanced feature" that only a few 
hardcore apps will need to use... in particular HBase and Hadoop :) )

> Support streaming RPC calls
> ---------------------------
>
>                 Key: AVRO-406
>                 URL: https://issues.apache.org/jira/browse/AVRO-406
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Todd Lipcon
>
> Avro nicely supports chunking of container types into multiple frames. We 
> need to expose this to RPC layer to facilitate use cases like the Hadoop 
> Datanode where a single "RPC" can yield far more data than should be buffered 
> in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to