Actually they also listen here and this is a basic question...

I'm not an expert, but how does having multiple threads really help this 
problem?

I'm assuming you're talking about a map/reduce job and not some specific client 
code which is being run on a client outside of the cloud/cluster....

I wasn't aware that you could easily synchronize threads running on different 
JVMs. ;-)

Your parallelism comes from multiple tasks running on different nodes within 
the cloud. By default you get one map/reduce job per block. You can write your 
own splitter to increase this and then get more parallelism. 

HTH

-Mike


-----Original Message-----
From: Hemanth Yamijala [mailto:[email protected]] 
Sent: Friday, July 02, 2010 2:56 AM
To: [email protected]
Subject: Re: Why single thread for HDFS?

Hi,

Can you please post this on [email protected] ? I suspect the
most qualified people to answer this question would all be on that
list.

Hemanth

On Fri, Jul 2, 2010 at 11:43 AM, elton sky <[email protected]> wrote:
> I guess this question was igored, so I just post it again.
>
> From my understanding, HDFS uses a single thread to do read and write.
> Since a file is composed of many blocks and each block is stored as a file
> in the underlying FS, we can do some parallelism on block base.
> When read across multi-blocks, threads can be used to read all blocks. When
> write, we can calculate the offset of each block and write to all of them
> simultaneously.
>
> Is this right?
>


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

Reply via email to