[ 
https://issues.apache.org/jira/browse/PARQUET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña resolved PARQUET-417.
---------------------------------
    Resolution: Not A Problem

Moving questions to the dev@ list.

> Questionable encoding decisions
> -------------------------------
>
>                 Key: PARQUET-417
>                 URL: https://issues.apache.org/jira/browse/PARQUET-417
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Benjamin Anderson
>            Priority: Minor
>
> (Opening a ticket here because my mail to dev@ disappeared
> and there doesn't seem to be any other way to contact Parquet
> devs - feel free to redirect me somewhere else)
> I'm working on a small Parquet project and encountering
> some surprising results with regard to encoding decisions.
> My dataset consists of ~1.5MM log lines parsed to an Avro schema and
> written to a Parquet file via AvroParquetWriter. According to its log
> output, Parquet is writing all int/long columns out with either
> [BIT_PACKED, PLAIN] or [BIT_PACKED, PLAIN_DICTIONARY]. This surprised
> me - at least one of those columns is a monotonic epoch value that should be
> quite amenable to the DELTA_BINARY_PACKED. What's the best way to
> understand Parquet's encoding choices?
> Secondary question: Is  DELTA_BINARY_PACKED supported for INT64
> columns? The documentation[1] says it is, but the code[2] suggests
> otherwise.
> Cheers.
> [1]: 
> https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5
> [2]: 
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L166-L168



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to