[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---------------------------

    Attachment: hdfs-918-20100203.patch

New patch.  Streamlined MultiplexedBlockSender, we now have one selector per 
worker thread and no BlockingQueues, writeable connections are handled inline 
by each thread as they're available.  

Includes a utility class to read a file with a bunch of threads and time them.

Ran some ad hoc jobs on my laptop and got similar performance to existing 
BlockSender, slightly faster for single file and slightly slower for 15 
competing localhost threads..  which is exactly the opposite of what I boldly 
predicted.   I read somewhere that linux thread scheduling for Java is disabled 
because it requires root, so it ignores priority -- if that's the case, maybe 
running more threads is actually an advantage when all the readers are local 
and you're directly competing with them for CPU -- you compete more effectively 
for limited resources with more threads.

I'm gonna try and write an MR job to run some different scenarios on a cluster 
soon (thundering herd, steady medium, large number of idles, individual read).. 
 I think the architecture here is more suited to large numbers of connections 
so if it did ok under a small number, then great.  I'll be pretty busy for the 
next month or so but will try to get this running in a cluster at some point 
and report some more interesting numbers.

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to