amogh-jahagirdar commented on code in PR #16213:
URL: https://github.com/apache/iceberg/pull/16213#discussion_r3196042053
##########
core/src/main/java/org/apache/iceberg/ManifestInfo.java:
##########
@@ -53,6 +55,28 @@ interface ManifestInfo {
"min_sequence_number",
Types.LongType.get(),
"Minimum sequence number of files in this manifest");
+ Types.StructType PARTITION_SUMMARY_TYPE =
+ Types.StructType.of(
+ Types.NestedField.required(
+ 509,
+ "contains_null",
+ Types.BooleanType.get(),
+ "True if any file has a null partition value"),
+ Types.NestedField.optional(
+ 518,
+ "contains_nan",
+ Types.BooleanType.get(),
+ "True if any file has a nan partition value"),
+ Types.NestedField.optional(
+ 510, "lower_bound", Types.BinaryType.get(), "Partition lower
bound for all files"),
+ Types.NestedField.optional(
+ 511, "upper_bound", Types.BinaryType.get(), "Partition upper
bound for all files"));
+ Types.NestedField PARTITION_SUMMARIES =
Review Comment:
Even if we keep partition tuples for data file entries, I don't think we
neccessarily need to keep partition field summaries for manifests as well, we
can still use stats at this level in the tree. In fact, I largely think we
shouldn't since it adds more complexity to the structure.
Ultimately, at the manifest level we want to preserve pruning but I think we
can do that with stats, so long as we ensure that stats on source columns are
collected and we have the rules to aggregate them correct. That was independent
of the decision to keep the tuple or not. Let me know if I'm missing something
@nastra @anoopj @rdblue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]