Splittable Gzip
---------------
Key: HADOOP-7076
URL: https://issues.apache.org/jira/browse/HADOOP-7076
Project: Hadoop Common
Issue Type: New Feature
Components: io
Reporter: Niels Basjes
Files compressed with the gzip codec are not splittable due to the nature of
the codec.
This limits the options you have scaling out when reading large gzipped input
files.
Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I
figured that for some use cases wasting some resources may result in a shorter
job time under certain conditions.
So reading the entire input file from the start for each split (wasting
resources!!) may lead to additional scalability.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.