Re: hadoop download performace when user app adopt multi-thread

Samuel Guo Tue, 08 Jul 2008 19:17:48 -0700

heyongqiang 写道:
> ipc.Client object is designed be able to share across threads, and each 
> thread can only made synchronized rpc call,which means each thread call and 
> wait for a result or error.This is implemented by a novel technique:each 
> thread made distinct call(with different call object),the user thread then 
> wait at his call object which later will be notified by the connection 
> receiver thread.The user thread made a call by first add his call object into 
> the call list which later be used by the response receiver,and synchronized 
> at the connection's socket outputstream waiting for writing his call out. And 
> the connection's thread is running to collect response on behalf of all user 
> threads.
> which i have not mentioned is that Client actually maintains a connection 
> table.
> In every Client object ,a connection culler is running behind as a 
> daemon,which's sole purpose is to remove idel connection from the connection 
> table,
> but it seems that this culler thread does not close the socket the connection 
> associated with,it only make a mark and do a notify. all the clean staff is 
> handled by the connection thread itself.This is really a wonderful design! 
> even the culler thread can culled the connection from the table, the 
> connection thread also includes remove code. That's because there is chance 
> that the connection thread would encounter some exception.
>
> The above is a brief summary of  my understanding of hadoop's ipc code.
> The below is a test result which is used to test the data throughput of 
> hadoop:
> +--------------+------------------+
> | threadCounts | avg(averageRate) |
> +--------------+------------------+
> |            1 |   53030539.48913 |
> |            2 |  35325499.583756 |
> |            3 |  24998284.969072 |
> |            4 |   19824934.28125 |
> |            5 |  15956391.489583 |
> |            6 |  15948640.175532 |
> |            7 |  14623977.375691 |
> |            8 |  16098080.160131 |
> |            9 |  8967970.3877005 |
> |           10 |  14569087.178947 |
> |           11 |  8962683.6662088 |
> |           12 |  20063735.297872 |
> |           13 |  13174481.053977 |
> |           14 |  10137907.034188 |
> |           15 |  6464513.2013889 |
> |           16 |   23064338.76087 |
> |           17 |   18688537.44385 |
> |           18 |  18270909.854317 |
> |           19 |  13086261.536538 |
> |           20 |  10784059.367347 |
> +--------------+------------------+
>
> the first column represents the thread counts of my test application, the 
> second column is the average download rate.It seems the rate download sharply 
> when the thread count increases.
> This is very simple test application.Anyone can tell me why?where is the 
> bottleneck when user app adopt multiple thread.
>
>


As you known, a block of the file in HDFS is presented as a file in the
local filesystem resides in a datanode.
Different threads read different files in HDFS or different blocks of a
(same) file in HDFS, may result a burst of read requests in different
local files(blocks of HDFS files) in a certain datanode. so the disk
seek time and I/O consumption will become heavy and the response time
will be longer.
But it is just a local behavior of a (single) datanode. The whole
throughput of the Hadoop cluster will be good.

so, can you supply any information about your test?
> heyongqiang
> 2008-06-20
>
>

Re: hadoop download performace when user app adopt multi-thread

Reply via email to