Hi all, I created some data using the randomwriter utility and compressed the map task outputs using the options -D mapred.output.compress=true -D mapred.map.output.compression.type=BLOCK
I set the bytes per map to be 128 MB but due to compression the final size of each map tasks output is around 75MB. I want to use these individual 75MB compressed files as input to another Map task. How do I get Hadoop to first decompress the files before computing the input splits for the map tasks? Thanks, Abhishek
