JJ, If it is just gzipped then no. gzip does not allow for splitting as you cannot seek to an arbitrary point in the file and then after, possibly moving to a sync point, start reading out the data. If it is a sequence file with gzip compression then yes, because the sequence file format only compresses the file in chunks, not the entire file at once.
--Bobby Evans On 6/23/11 1:21 AM, "Mapred Learn" <mapred.le...@gmail.com> wrote: Hi, If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into smaller chunks (~ 1 GB) so that I can run a map-red job on those files and finish faster than running job on 1 big file ? Thanks, -JJ