[jira] Commented: (AVRO-24) benchmark bulk data

Doug Cutting (JIRA) Fri, 26 Jun 2009 09:26:00 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724603#action_12724603
 ]


Doug Cutting commented on AVRO-24:
----------------------------------

> Reading 64KB frames might be enough to mask 1ms latency between frames. 
> Otherwise it might need pipelining of multiple frames.

I agree.  I hope 64k frames will mean we don't need to add an async RPC API.  
Note however that the wire format should not change with async: we'll tag 
requests and responses with a call ID anyway so that we can multiplex over a 
single connection.

> do multiple simultaneous transfers use different connections?

The plan is to make Hadoop's Client and Server implement Avro's Transciever 
interface, so the transport will be the same as Hadoop's current RPC transport. 
 This caches a single connection per host, so simultaneous transfers from or to 
a host share a connection.  We'll see how this goes.

> Server side : Datanode : Is the disk data fetched inside RPC handler?

If we use transferTo, the disk would be accessed in the Server.Responder 
thread, otherwise it will be accessed in a Server.Handler thread (of which 
there are many).  So, if we implement transferTo, we'd probably need to change 
Server to support more responder threads too, but, if we don't, I think the 
existing handler pool will work well.  When there are more active requests than 
spindles, the server will slow down, as expected.  A slow spindle will mostly 
only affect requests to that spindle, since data will be fully buffered before 
the connection is accessed.

> benchmark bulk data
> -------------------
>
>                 Key: AVRO-24
>                 URL: https://issues.apache.org/jira/browse/AVRO-24
>             Project: Avro
>          Issue Type: Task
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.0.0
>
>
> It would be good to validate that the RPC wire format is capable of 
> transmitting bulk data efficiently.  In particular, to be used for HDFS file 
> access, it must be able to, when including file data in an RPC response, or 
> writing file data in an RPC request:
>  - saturate a disk's throughput or a network interface; and
>  - not consume much CPU.
> In other words, Avro's RPC should not be a bottleneck in the transfer of file 
> data from a remote disk to an application or vice versa, and moreover it 
> should leave the vast majority of the CPU for the application.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-24) benchmark bulk data

Reply via email to