kbendick opened a new pull request, #5009: URL: https://github.com/apache/iceberg/pull/5009
This closes issue https://github.com/apache/iceberg/issues/4689 Currently, if a table has the `WRITE_PARTITION_SUMMARY_LIMIT`, `write.summary.partition-limit`, set to a non-zero value, it will generate partition summary data for a partition field with an empty string name. This field essentially comes from the unpartitioned spec. Notice that `partitions.` field, which is the [changed partitions prefix](https://github.com/apache/iceberg/blob/3584c79022ec70f79b326550736b4600d249e4a2/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L54), should normally have the partition field name after it, as it's a summary for that partition field. However, for unpartitioned tables, if the partition summary limit is greater than zero, a `partitions.` field will be added for an empty string column. Here's an example: ```json "summary" : { "operation" : "append", "spark.app.id" : "local-1651499999454", "added-data-files" : "1", "added-records" : "1", "added-files-size" : "608", "changed-partition-count" : "1", "partition-summaries-included" : "true", "partitions." : "added-data-files=1,added-records=1,added-files-size=608", "total-records" : "1", "total-files-size" : "608", "total-data-files" : "1", "total-delete-files" : "0", "total-position-deletes" : "0", "total-equality-deletes" : "0" } ``` This PR removes partition summaries for commits on unpartitioned tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
