If all you want to do is to have a faster -cp option, then if you know your intial block list and location, you need to generate the target bloc list and then create a single thread per block and process each block in a separate thread.
You don't need to use the local disk and just read/write each block in 'paged' increments. (pages as in 4/16/32/64K page sizes.) (This removes the i/o argument raised by another poster.) This may be faster than the current process. HTH -Mike > Date: Tue, 6 Jul 2010 13:46:34 +1000 > Subject: Re: Why single thread for HDFS? > From: [email protected] > To: [email protected] > > >Basically, your point is that hadoop dfs -cp is relatively slow and could > be made faster. If HDFS had a more multi-threaded >design, itwould make cp > operations faster. > What I mean is, if we have the size of a file we can parallel by calculating > blocks. Otherwise we couldn't. > > > On Tue, Jul 6, 2010 at 10:47 AM, Allen Wittenauer > <[email protected]>wrote: > > > > > On Jul 5, 2010, at 5:01 PM, elton sky wrote: > > > Well, this sounds good when you have many small files, you concat() them > > > into a big one. I am talking about split a big file into blocks and copy > > all > > > a few blocks in parallel. > > > > Basically, your point is that hadoop dfs -cp is relatively slow and could > > be made faster. If HDFS had a more multi-threaded design, it would make cp > > operations faster. > > > > This sounds like a particularly high cost for an operation that is rarely > > utilized. [This is much more interesting in a distcp context, but even then > > not that great. distcp in my experience is usually used to push a bunch of > > files, so you get your parallelism at the file level. Typically these are > > part files are usually the same approx. size.] > > > > > > _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
