jackye1995 commented on a change in pull request #1790:
URL: https://github.com/apache/iceberg/pull/1790#discussion_r528430637



##########
File path: core/src/main/java/org/apache/iceberg/MetricsUtil.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class MetricsUtil {
+
+  private MetricsUtil() {
+  }
+
+  public static Map<Integer, Long> getNanValueCounts(
+      Stream<FieldMetrics> fieldMetrics, MetricsConfig metricsConfig, Schema 
inputSchema) {
+    if (fieldMetrics == null || inputSchema == null) {
+      return Maps.newHashMap();
+    }
+
+    return fieldMetrics
+        .filter(metrics -> {

Review comment:
       Since we are having this util class, can we decompose this function into 
methods like `metricsColumnName(FieldMetrics, Schema)` and 
`metricsMode(FieldMetrics, MetricsConfig)`? They might be useful in other 
classes, and also make the lambda chain cleaner.

##########
File path: flink/src/main/java/org/apache/iceberg/flink/data/FlinkOrcWriter.java
##########
@@ -129,5 +139,9 @@ private WriteBuilder() {
               "Invalid iceberg type %s corresponding to Flink logical type 
%s", iPrimitive, flinkPrimitive));
       }
     }
+
+    private int getFieldId(TypeDescription typeDescription) {

Review comment:
       nit:same comment as before, do we need this private method for the 1 
line call?

##########
File path: api/src/main/java/org/apache/iceberg/NaNOnlyFieldMetrics.java
##########
@@ -17,51 +17,50 @@
  * under the License.
  */
 
-package org.apache.iceberg.parquet;
+package org.apache.iceberg;
 
 import java.nio.ByteBuffer;
-import org.apache.iceberg.FieldMetrics;
 
 /**
- * Iceberg internally tracked field level metrics, used by Parquet writer only.
+ * Iceberg internally tracked field level metrics, used by Parquet and ORC 
writers only.
  * <p>
- * Parquet keeps track of most metrics in its footer, and only NaN counter is 
actually tracked by writers.
- * This wrapper ensures that metrics not being updated by Parquet writers will 
not be incorrectly used, by throwing
+ * Parquet/ORC keeps track of most metrics in file statistics, and only NaN 
counter is actually tracked by writers.
+ * This wrapper ensures that metrics not being updated by those writers will 
not be incorrectly used, by throwing
  * exceptions when they are accessed.
  */
-public class ParquetFieldMetrics extends FieldMetrics {

Review comment:
       I thought the reason for having `ParquetFieldMetrics` extending 
`FieldMetrics` is to allow ORC and parquet to diverge if needed. If we are 
moving them back to a common class, why not just move everything back to 
`FieldMetrics`?

##########
File path: core/src/main/java/org/apache/iceberg/MetricsUtil.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class MetricsUtil {
+
+  private MetricsUtil() {
+  }
+
+  public static Map<Integer, Long> getNanValueCounts(

Review comment:
       nit: seems like iceberg prefers method names without `get`.

##########
File path: flink/src/main/java/org/apache/iceberg/flink/data/FlinkOrcWriter.java
##########
@@ -46,8 +50,8 @@ private FlinkOrcWriter(RowType rowType, Schema iSchema) {
     }
   }
 
-  public static OrcRowWriter<RowData> buildWriter(RowType rowType, Schema 
iSchema) {
-    return new FlinkOrcWriter(rowType, iSchema);
+  public static OrcRowWriter<RowData> buildWriter(RowType rowType, Schema 
iSchema, TypeDescription schema) {

Review comment:
       is it necesSary to pass in the type description? can we get id from the 
iceberg schema?

##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriter.java
##########
@@ -109,6 +112,10 @@ private WriteBuilder() {
               iPrimitive, primitive));
       }
     }
+
+    private int getFieldId(TypeDescription typeDescription) {

Review comment:
       nit: the private method feels redundant since the body is also just one 
line

##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/source/StructInternalRow.java
##########
@@ -115,12 +120,30 @@ public short getShort(int ordinal) {
 
   @Override
   public int getInt(int ordinal) {
-    return struct.get(ordinal, Integer.class);
+    Object integer = struct.get(ordinal, Object.class);

Review comment:
       For this issue about handling date and timestamp, can it be a separated 
PR? This PR is already very big.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to