RussellSpitzer commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r729957842



##########
File path: core/src/main/java/org/apache/iceberg/DataFiles.java
##########
@@ -285,7 +285,12 @@ public DataFile build() {
       }
       Preconditions.checkArgument(format != null, "File format is required");
       Preconditions.checkArgument(fileSizeInBytes >= 0, "File size is 
required");
-      Preconditions.checkArgument(recordCount >= 0, "Record count is 
required");
+      Preconditions.checkArgument(recordCount != null, "Record count is 
required");
+      // MetricsEvaluator skips using other metrics, if record count is -1
+      Preconditions.checkArgument(recordCount >= 0 ||
+              (recordCount == -1 && valueCounts == null && columnSizes == null 
&& nanValueCounts == null &&
+                      lowerBounds == null && upperBounds == null),
+          "Metrics cannot be set if record count is -1.");

Review comment:
       Because our importAvroPartitions code expects to be able to do this and 
our metrics evaluation code assumes it can be -1 as well. We could forbid this 
but that would require changing the import avro code to also fully scan avro 
files before importing.

##########
File path: core/src/main/java/org/apache/iceberg/DataFiles.java
##########
@@ -285,7 +285,12 @@ public DataFile build() {
       }
       Preconditions.checkArgument(format != null, "File format is required");
       Preconditions.checkArgument(fileSizeInBytes >= 0, "File size is 
required");
-      Preconditions.checkArgument(recordCount >= 0, "Record count is 
required");
+      Preconditions.checkArgument(recordCount != null, "Record count is 
required");
+      // MetricsEvaluator skips using other metrics, if record count is -1
+      Preconditions.checkArgument(recordCount >= 0 ||
+              (recordCount == -1 && valueCounts == null && columnSizes == null 
&& nanValueCounts == null &&
+                      lowerBounds == null && upperBounds == null),
+          "Metrics cannot be set if record count is -1.");

Review comment:
       Because our importAvroPartitions code expects to be able to do this and 
our metrics evaluation code assumes it can be -1 as well. We could forbid this 
but that would require changing the import avro code to also fully scan avro 
files before importing. As is, importAvro has been broken since we switched to 
the builder.

##########
File path: core/src/main/java/org/apache/iceberg/DataFiles.java
##########
@@ -285,7 +285,12 @@ public DataFile build() {
       }
       Preconditions.checkArgument(format != null, "File format is required");
       Preconditions.checkArgument(fileSizeInBytes >= 0, "File size is 
required");
-      Preconditions.checkArgument(recordCount >= 0, "Record count is 
required");
+      Preconditions.checkArgument(recordCount != null, "Record count is 
required");
+      // MetricsEvaluator skips using other metrics, if record count is -1
+      Preconditions.checkArgument(recordCount >= 0 ||
+              (recordCount == -1 && valueCounts == null && columnSizes == null 
&& nanValueCounts == null &&
+                      lowerBounds == null && upperBounds == null),
+          "Metrics cannot be set if record count is -1.");

Review comment:
       I don't think there is a problem with that, we were just working with 
the past behavior which was to include a rowcount of -1.  I think writing new 
code to skip through avro blocks and sum up row counts is fine too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to