JJ,

If it is just gzipped then no.  gzip does not allow for splitting as you cannot 
seek to an arbitrary point in the file and then after, possibly moving to a 
sync point, start reading out the data.  If it is a sequence file with gzip 
compression then yes, because the sequence file format only compresses the file 
in chunks, not the entire file at once.

--Bobby Evans

On 6/23/11 1:21 AM, "Mapred Learn" <mapred.le...@gmail.com> wrote:

Hi,
If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into 
smaller chunks (~ 1 GB) so that I can run a map-red job on those files and 
finish faster than running job on 1 big file ?

Thanks,
-JJ

Reply via email to