If all you want to do is to have a faster -cp option, then if you know your 
intial block list and location, you need to generate the target bloc list and 
then create a single thread per block and process each block in a separate 
thread.

You don't need to use the local disk and just read/write each block in 'paged' 
increments. (pages as in 4/16/32/64K page sizes.) 
(This removes the i/o argument raised by another poster.)

This may be faster than the current process.

HTH

-Mike

> Date: Tue, 6 Jul 2010 13:46:34 +1000
> Subject: Re: Why single thread for HDFS?
> From: [email protected]
> To: [email protected]
> 
> >Basically, your point is that hadoop dfs -cp is relatively slow and could
> be made faster.  If HDFS had a more multi-threaded >design, itwould make cp
> operations faster.
> What I mean is, if we have the size of a file we can parallel by calculating
> blocks. Otherwise we couldn't.
> 
> 
> On Tue, Jul 6, 2010 at 10:47 AM, Allen Wittenauer
> <[email protected]>wrote:
> 
> >
> > On Jul 5, 2010, at 5:01 PM, elton sky wrote:
> > > Well, this sounds good when you have many small files, you concat() them
> > > into a big one. I am talking about split a big file into blocks and copy
> > all
> > > a few blocks in parallel.
> >
> > Basically, your point is that hadoop dfs -cp is relatively slow and could
> > be made faster.  If HDFS had a more multi-threaded design, it would make cp
> > operations faster.
> >
> > This sounds like a particularly high cost for an operation that is rarely
> > utilized.  [This is much more interesting in a distcp context, but even then
> > not that great.  distcp in my experience is usually used to push a bunch of
> > files, so you get your parallelism at the file level.  Typically these are
> > part files are usually the same approx. size.]
> >
> >
> >
                                          
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Reply via email to