[
https://issues.apache.org/jira/browse/PARQUET-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Le Dem resolved PARQUET-839.
-----------------------------------
Resolution: Duplicate
> Min-max should be computed based on logical type
> ------------------------------------------------
>
> Key: PARQUET-839
> URL: https://issues.apache.org/jira/browse/PARQUET-839
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Affects Versions: format-2.3.1
> Reporter: Tim Armstrong
>
> The min/max stats are currently underspecified - it is not clear in any cases
> from the spec what the expected ordering is.
> There are some related issues, like PARQUET-686 to fix specific problems, but
> there seems to be a general assumption that the min/max should be defined
> based on the primitive type instead of the logical type.
> However, this makes the stats nearly useless for some logical types. E.g.
> consider a DECIMAL encoded into a (variable-length) BINARY. The min-max of
> the underlying binary type is based on the lexical order of the byte string,
> but that does not correspond to any reasonable ordering of the decimal
> values. E.g. 16 (0x1 0x0) will be ordered between 1 (0x0) and (0x2). This
> makes min-max filtering a lot less effective and would force query engines
> using parquet to implement workarounds to produce correct results (e.g.
> custom comparators).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)