Re: Fast Putting Files to HDFS.

jason hadoop Tue, 30 Jun 2009 17:25:33 -0700

The copyFromLocal arguments can take a directory instead of individual
files.


On Tue, Jun 30, 2009 at 7:22 PM, pavel kolodin <[email protected]>wrote:

>
> I have have input data files in local filesystem: /input/path/*.log. But i
> don't know nothing about their sizes, number of files etc. If *.log files
> are small and there are lots of them it is no reason to start "bin/hadoop fs
> -put" for each file, because one start of "bin/hadoop" is time-expensive.
>
> 1. What if i write "bin/hadoop fs -put /input/path/*.log /hdfs/input/path"?
> Will "*" be passed to hadoop and hadoop will open all files one-by-one or
> "*" will be processed by "/bin/bash"? If second choise, then what if
> "expanded" command-line will me too long for bash itself (more than 32768
> symbols for ex? (if files too many))
> 2. I can merge many small files to "packs" (cat filename >> pack) by 100MB
> per "pack" and then putting them into HDFS. But what if number of files too
> many and finish size of all data is several GB? So i will need free space on
> HDD = (input data size)*2 for that operations...
>
> thank you
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Fast Putting Files to HDFS.

Reply via email to