Columns could have different compression codecs, but I don't think implementations currently support writing with more than one codec. Readers should support it. I think that columns are the right level for choosing a codec and there's very little overhead for this enum so it doesn't cost very much to put it there instead of in the footer.
rb On Fri, Sep 22, 2017 at 2:06 AM, 黄权隆 <[email protected]> wrote: > Hi all, > > I asked a question in StackOverflow but seems here is the right place to > reach. Link: > https://stackoverflow.com/questions/46312522/is-it- > possible-or-reasonable-for-a-parquet-file-to-have-multiple-compression-typ > > After reading the thrift definition of metadata in Parquet ( > https://github.com/apache/parquet-format#metadata), I found that there's a > CompressionCodec field in each ColumnMetaData. > > Is it possible or reasonable for a Parquet file to have different > compression types for different columns? > > If so, what's the scenario? How can I general such a file for test? > ParquetWriter only accept a compression type in its constructor. > > If not, why not move the CompressionCodec field into the FileMetaData? For > example, ORC uses a uniform compression for all columns and writes its > compression type in Postscript (https://orc.apache.org/docs/file-tail.html > ). > > > Thanks > Quanlong > -- Ryan Blue Software Engineer Netflix
