[ http://issues.apache.org/jira/browse/HADOOP-474?page=all ]
Doug Cutting updated HADOOP-474:
--------------------------------
Status: Resolved (was: Patch Available)
Resolution: Fixed
I just committed this. Thanks, Owen!
> support compressed text files as input and output
> -------------------------------------------------
>
> Key: HADOOP-474
> URL: http://issues.apache.org/jira/browse/HADOOP-474
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.5.0
> Reporter: Owen O'Malley
> Assigned To: Owen O'Malley
> Fix For: 0.6.0
>
> Attachments: text-gz-2.patch, text-gz-3.patch, text-gz.patch
>
>
> I'd like TextInputFomat and TextOutputFormat to automatically compress and
> uncompress text files when they are read and written. Furthermore, I'd like
> to be able to use custom compressors as defined in HADOOP-441. Therefore, I
> propose:
> Adding a map of compression codecs in the server config files:
> io.compression.codecs = "<suffix>=<codec class>,..."
> so the default would be something like:
> <property>
> <name>io.compression.codecs</name>
>
> <value>.gz=org.apache.hadoop.io.GZipCodec,.Z=org.apache.hadoop.io.ZipCodec</value>
> <description>A list of file suffixes and the codecs for them.</description>
> </property>
> note that the suffix can include multiple "." so you could support suffixes
> like ".tar.gz", but they are just treated as literals against the end of the
> filename.
> If the TextInputFormat is dealing with such a file, it:
> 1. makes a single split
> 2. decompresses automatically
> On the output side, if mapred.output.compress is true, then TextOutputFormat
> would use a new property mapred.output.compression.codec that would define
> the codec to use to compress the outputs, defaulting to gzip.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira