Hi there, I am storing web log files in HDFS, and run grep map&reduce task. Web log file is so big(xGB/1day), so I compressed that files. **unfortunately, If a file is compressed like gzip, hadoop makes only one map task.
I know if I split web log file, I can use multiply map&reduce task. but my customer requests that number of files must be same. So, I have plan to make ZipInputFormat class for ZipEntry and to modify copyFromLocal function. What do you think about my plan? Any thoughts are greatly appreciated. thanks.
