[
https://issues.apache.org/jira/browse/ORC-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106521#comment-17106521
]
Panagiotis Garefalakis edited comment on ORC-629 at 5/13/20, 5:51 PM:
----------------------------------------------------------------------
Moving FIX as part of the RecordReaderImp, when either min, max or sum contains
a non-Finite number the stats bundle is not useful thus disabling PPD.
The implementation described above might need writer changes (add a
hasNonFinite in the proto) as it could affect merging multiple stripe stats in
memory -- it can be implemented as a follow-up.
Keep in mind that PARQUET-1246 followed a similar path.
was (Author: pgaref):
Moving FIX as part of the RecordReaderImp, when either min, max or sum contains
a non-Finite number the stats bundle is not useful thus we disable PPD.
The implementation described above might need writer changes (add a
hasNonFinite in the proto) as it could affect merging multiple stripe stats in
memory -- thus can be a follow up of this.
Keep in mind that PARQUET-1246 followed a similar path.
> PPD: Floating point NaN is not transitive across comparisons
> ------------------------------------------------------------
>
> Key: ORC-629
> URL: https://issues.apache.org/jira/browse/ORC-629
> Project: ORC
> Issue Type: Bug
> Reporter: Gopal Vijayaraghavan
> Assignee: Panagiotis Garefalakis
> Priority: Major
>
> Range comparisons don't work right for columns which start with Double.NaN as
> the first row (min == max == NaN).
> 1 < NaN is false.
> 1 > NaN is false.
> {code}
> File Version: 0.12 with ORC_135
> Rows: 3
> Compression: ZLIB
> Compression size: 32768
> Type:
> struct<operation:int,originalTransaction:bigint,bucket:int,rowId:bigint,currentTransaction:bigint,row:struct<c:double>>
> Stripe Statistics:
> Stripe 1:
> Column 0: count: 3 hasNull: false
> Column 1: count: 3 hasNull: false bytesOnDisk: 5 min: 0 max: 0 sum: 0
> Column 2: count: 3 hasNull: false bytesOnDisk: 5 min: 1 max: 1 sum: 3
> Column 3: count: 3 hasNull: false bytesOnDisk: 8 min: 536870912 max:
> 536870912 sum: 1610612736
> Column 4: count: 3 hasNull: false bytesOnDisk: 7 min: 0 max: 2 sum: 3
> Column 5: count: 3 hasNull: false bytesOnDisk: 5 min: 1 max: 1 sum: 3
> Column 6: count: 3 hasNull: false
> Column 7: count: 3 hasNull: false bytesOnDisk: 19 min: NaN max: NaN sum:
> NaN
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)