[
https://issues.apache.org/jira/browse/AVRO-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724252#action_12724252
]
Doug Cutting commented on AVRO-24:
----------------------------------
> Are bulk transfers already part of the spec?
The idea is that bulk transfers can be efficiently implemented by just using
the 'bytes' type in parameter, field and/or return values. When a large value
of type bytes is transmitted, it generates a separate frame at the transport
layer. Clients can then read and write such large values without copying. On
write, if one passes a large ByteBuffer as a parameter, field or return value,
a reference is passed down and it is written directly to the socket.
Similarly, on read, the ByteBuffer that's read from the socket is directly
returned to the client as the value of the field, parameter or method.
http://people.apache.org/~cutting/avro/spec.html#Message+Framing
This is not yet perfect. First, while Avro permits object reuse, its RPC
framework does not. So, if an RPC method returns a ByteBuffer, a new
ByteBuffer will be allocated per call. However we could easily add a pool here
to address this.
Second, sendfile is not yet supported. This would require using an alternate
representation for values of type bytes. One might define something like:
interface ByteChannelable {
int write(WritableByteChannel c);
int read(ReadableByteChannel c);
byte[] bytes();
void bytes(byte[]);
ByteBuffer buffer();
}
Then one could implement a version of this that contains a FileChannel and a
start and end position whose read and write methods would call transferFrom and
transferTo.
We could switch to such a representation by default, instead of using
ByteBuffer (which unfortunately cannot be extended). Note that any Requestor
and Responder can easily be extended to use a different DatumReader, so we
would not have to make this the default.
But first, I thought we'd benchmark things without these changes to get a
baseline.
> benchmark bulk data
> -------------------
>
> Key: AVRO-24
> URL: https://issues.apache.org/jira/browse/AVRO-24
> Project: Avro
> Issue Type: Task
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.0.0
>
>
> It would be good to validate that the RPC wire format is capable of
> transmitting bulk data efficiently. In particular, to be used for HDFS file
> access, it must be able to, when including file data in an RPC response, or
> writing file data in an RPC request:
> - saturate a disk's throughput or a network interface; and
> - not consume much CPU.
> In other words, Avro's RPC should not be a bottleneck in the transfer of file
> data from a remote disk to an application or vice versa, and moreover it
> should leave the vast majority of the CPU for the application.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.