heyongqiang 写道: > ipc.Client object is designed be able to share across threads, and each > thread can only made synchronized rpc call,which means each thread call and > wait for a result or error.This is implemented by a novel technique:each > thread made distinct call(with different call object),the user thread then > wait at his call object which later will be notified by the connection > receiver thread.The user thread made a call by first add his call object into > the call list which later be used by the response receiver,and synchronized > at the connection's socket outputstream waiting for writing his call out. And > the connection's thread is running to collect response on behalf of all user > threads. > which i have not mentioned is that Client actually maintains a connection > table. > In every Client object ,a connection culler is running behind as a > daemon,which's sole purpose is to remove idel connection from the connection > table, > but it seems that this culler thread does not close the socket the connection > associated with,it only make a mark and do a notify. all the clean staff is > handled by the connection thread itself.This is really a wonderful design! > even the culler thread can culled the connection from the table, the > connection thread also includes remove code. That's because there is chance > that the connection thread would encounter some exception. > > The above is a brief summary of my understanding of hadoop's ipc code. > The below is a test result which is used to test the data throughput of > hadoop: > +--------------+------------------+ > | threadCounts | avg(averageRate) | > +--------------+------------------+ > | 1 | 53030539.48913 | > | 2 | 35325499.583756 | > | 3 | 24998284.969072 | > | 4 | 19824934.28125 | > | 5 | 15956391.489583 | > | 6 | 15948640.175532 | > | 7 | 14623977.375691 | > | 8 | 16098080.160131 | > | 9 | 8967970.3877005 | > | 10 | 14569087.178947 | > | 11 | 8962683.6662088 | > | 12 | 20063735.297872 | > | 13 | 13174481.053977 | > | 14 | 10137907.034188 | > | 15 | 6464513.2013889 | > | 16 | 23064338.76087 | > | 17 | 18688537.44385 | > | 18 | 18270909.854317 | > | 19 | 13086261.536538 | > | 20 | 10784059.367347 | > +--------------+------------------+ > > the first column represents the thread counts of my test application, the > second column is the average download rate.It seems the rate download sharply > when the thread count increases. > This is very simple test application.Anyone can tell me why?where is the > bottleneck when user app adopt multiple thread. > >
As you known, a block of the file in HDFS is presented as a file in the local filesystem resides in a datanode. Different threads read different files in HDFS or different blocks of a (same) file in HDFS, may result a burst of read requests in different local files(blocks of HDFS files) in a certain datanode. so the disk seek time and I/O consumption will become heavy and the response time will be longer. But it is just a local behavior of a (single) datanode. The whole throughput of the Hadoop cluster will be good. so, can you supply any information about your test? > heyongqiang > 2008-06-20 > >
