Jerry Chen created HADOOP-9996:
----------------------------------

             Summary: Improve TFile format to support any compression codecs
                 Key: HADOOP-9996
                 URL: https://issues.apache.org/jira/browse/HADOOP-9996
             Project: Hadoop Common
          Issue Type: Improvement
          Components: io
    Affects Versions: 3.0.0
            Reporter: Jerry Chen


TFile is a container of key-value pairs. It supports block level compression by 
using compression codec. But one limitation of the current implementation is it 
supports only a few of fixed compression codecs. They are LZO, GZ or no 
compression. Some new compression codecs such as Snappy cannot be used because 
of this limitation.

We propose to extend the existing TFile compression feature to support any 
compression codecs. As TFile already used the named compression codecs and 
stored the name in the file meta data (for example, “lzo” was stored when LZO 
compression is used), we cannot change this for backward compatibility. To make 
it support any compression codec, we add a special name “codec” after which 
follows the real codec class name. For example, “codec: 
org.apache.hadoop.io.compress.SnappyCodec” is used and stored in the meta when 
SnappyCodec is used as the compression codec. We can still use the existing 
fixed names such as “lzo”, “gz” or “none” for specifying the TFile compression 
codec.

 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to