Hi Sunil, Please check HarFileSystem (Hadoop Archive Filesystem), it will be useful to solve your problem.
Thanks Devaraj ________________________________________ From: Sunil S Nandihalli [sunil.nandiha...@gmail.com] Sent: Tuesday, April 24, 2012 7:12 PM To: common-user@hadoop.apache.org Subject: hadoop streaming and a directory containing large number of .tgz files Hi Everybody, I am a newbie to hadoop. I have about 40K .tgz files each of approximately 3MB . I would like to process this as if it were a single large file formed by "cat list-of-files | gnuparallel 'tar -Oxvf {} | sed 1d' > output.txt" how can I achieve this using hadoop-streaming or some-other similar library.. thanks, Sunil.