Hi there,

I am storing web log files in HDFS, and run grep map&reduce task.
Web log file is so big(xGB/1day), so I compressed that files.
**unfortunately, If a file is compressed like gzip, hadoop makes only one
map task.

I know if I split web log file, I can use multiply map&reduce task.
but my customer requests that number of files must be same.

So, I have plan to make ZipInputFormat class for ZipEntry and to modify
copyFromLocal function.
What do you think about my plan? Any thoughts are greatly appreciated.

thanks.

Reply via email to