Fast Putting Files to HDFS.

pavel kolodin Tue, 30 Jun 2009 15:23:05 -0700

I have have input data files in local filesystem: /input/path/*.log. But idon't know nothing about their sizes, number of files etc. If *.log filesare small and there are lots of them it is no reason to start "bin/hadoopfs -put" for each file, because one start of "bin/hadoop" istime-expensive.

1. What if i write "bin/hadoop fs -put /input/path/*.log/hdfs/input/path"? Will "*" be passed to hadoop and hadoop will open allfiles one-by-one or "*" will be processed by "/bin/bash"? If secondchoise, then what if "expanded" command-line will me too long for bashitself (more than 32768 symbols for ex? (if files too many))2. I can merge many small files to "packs" (cat filename >> pack) by 100MBper "pack" and then putting them into HDFS. But what if number of filestoo many and finish size of all data is several GB? So i will need freespace on HDD = (input data size)*2 for that operations...


thank you

Fast Putting Files to HDFS.

Reply via email to