[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398632#comment-16398632
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-----------------------------------------

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174465693
 
 

 ##########
 File path: 
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 ##########
 @@ -73,6 +73,70 @@ public Builder withNumNulls(long numNulls) {
     }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+    public FloatBuilder(PrimitiveType type) {
+      super(type);
+      assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+    }
+
+    @Override
+    public Statistics<?> build() {
+      FloatStatistics stats = (FloatStatistics) super.build();
+      if (stats.hasNonNullValue()) {
+        Float min = stats.genericGetMin();
+        Float max = stats.genericGetMax();
+        // Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+        if (min.isNaN() || max.isNaN()) {
+          stats.setMinMax(0.0f, 0.0f);
+          ((Statistics<?>) stats).hasNonNullValue = false;
+        } else {
+          // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+          if (min == 0.0f) {
+            stats.setMinMax(-0.0f, max);
+            min = -0.0f;
+          }
+          if (max == -0.0f) {
+            stats.setMinMax(min, 0.0f);
+          }
+        }
+      }
+      return stats;
+    }
+  }
+
+  // Builder for DOUBLE type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class DoubleBuilder extends Builder {
+    public DoubleBuilder(PrimitiveType type) {
+      super(type);
+      assert type.getPrimitiveTypeName() == PrimitiveTypeName.DOUBLE;
+    }
+
+    @Override
+    public Statistics<?> build() {
 
 Review comment:
   Same comments as for the float case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Ignore float/double statistics in case of NaN
> ---------------------------------------------
>
>                 Key: PARQUET-1246
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1246
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.8.1
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to