On Mar 11, 2008, at 9:38 AM, Jimmy Wan wrote:

Is it possible to split compressed input from a single job to multiple map tasks?

It depends on the form of the compression. If you are using the zlib (gzip) text file compression, then no. The problem is that there is no way to start in the middle of the stream. If you use block compressed sequence files, then it will work fine. There are rumors of a bzip input format that supports input splitting, but bzip is very slow for many applications. (Although the compression is good)

-- Owen

Reply via email to