hehe
I notices that in the DFSClient's DataStreamer thread, the run method is
sending data out with synchronized on the dataqueue, is this really need?
I mean remove,wait,and getFirst of variable dataQueue should be synchronized on
the dataQueue,but is it need to hold a lock when send one packet out?
I doubt. Can any developer give me one reason for doing that?
heyongqiang
2008-06-23
发件人: hong
发送时间: 2008-06-21 10:10:59
收件人: [email protected]
抄送:
主题: Re: understanding of client connection code
兄弟是 余海燕 的部队吗?
在 2008-6-20,下午5:00,heyongqiang 写道:
> ipc.Client object is designed be able to share across threads, and
> each thread can only made synchronized rpc call,which means each
> thread call and wait for a result or error.This is implemented by a
> novel technique:each thread made distinct call(with different call
> object),the user thread then wait at his call object which later
> will be notified by the connection receiver thread.The user thread
> made a call by first add his call object into the call list which
> later be used by the response receiver,and synchronized at the
> connection's socket outputstream waiting for writing his call out.
> And the connection's thread is running to collect response on
> behalf of all user threads.
> which i have not mentioned is that Client actually maintains a
> connection table.
> In every Client object ,a connection culler is running behind as a
> daemon,which's sole purpose is to remove idel connection from the
> connection table,
> but it seems that this culler thread does not close the socket the
> connection associated with,it only make a mark and do a notify. all
> the clean staff is handled by the connection thread itself.This is
> really a wonderful design! even the culler thread can culled the
> connection from the table, the connection thread also includes
> remove code. That's because there is chance that the connection
> thread would encounter some exception.
>
> The above is a brief summary of my understanding of hadoop's ipc
> code.
> The below is a test result which is used to test the data
> throughput of hadoop:
> +--------------+------------------+
> | threadCounts | avg(averageRate) |
> +--------------+------------------+
> | 1 | 53030539.48913 |
> | 2 | 35325499.583756 |
> | 3 | 24998284.969072 |
> | 4 | 19824934.28125 |
> | 5 | 15956391.489583 |
> | 6 | 15948640.175532 |
> | 7 | 14623977.375691 |
> | 8 | 16098080.160131 |
> | 9 | 8967970.3877005 |
> | 10 | 14569087.178947 |
> | 11 | 8962683.6662088 |
> | 12 | 20063735.297872 |
> | 13 | 13174481.053977 |
> | 14 | 10137907.034188 |
> | 15 | 6464513.2013889 |
> | 16 | 23064338.76087 |
> | 17 | 18688537.44385 |
> | 18 | 18270909.854317 |
> | 19 | 13086261.536538 |
> | 20 | 10784059.367347 |
> +--------------+------------------+
>
> the first column represents the thread counts of my test
> application, the second column is the average download rate.It
> seems the rate download sharply when the thread count increases.
> This is very simple test application.Anyone can tell me why?where
> is the bottleneck when user app adopt multiple thread.
>
>
>
>
> heyongqiang
> 2008-06-20