rdblue commented on code in PR #14234:
URL: https://github.com/apache/iceberg/pull/14234#discussion_r3244956366


##########
format/spec.md:
##########
@@ -675,7 +676,7 @@ The `data_file` struct consists of the following fields:
 
 The `partition` struct stores the tuple of partition values for each file. Its 
type is derived from the partition fields of the partition spec used to write 
the manifest file. In v2, the partition struct's field ids must match the ids 
from the partition spec.
 
-The column metrics maps are used when filtering to select both data and delete 
files. For delete files, the metrics must store bounds and counts for all 
deleted rows, or must be omitted. Storing metrics for deleted rows ensures that 
the values can be used during job planning to find delete files that must be 
merged during a scan.
+The v4 `content_stats` struct stores field-level metrics. Unlike the metrics 
maps, the type of `content_stats` is based on table metadata, like schema. 
Similar to the `partition` struct, the same type is used for all files tracked 
in a manifest.

Review Comment:
   @danielcweeks, I'll update this to call out that it is talking about the 
`content_stat` _container_ struct, not the individual field structs. I've used 
the convention that `content_stats` refers to the container struct, "content 
stats" refers to the whole thing, and I try to use wording like "field level 
stats struct" for the field-specific ones.
   
   @RussellSpitzer, I considered that, but since we don't have the root 
manifest in yet, I thought it was easier to add aggregation later. I tried to 
avoid mentioning whether stats were for data files or manifest files so that we 
can reuse the wording here for both and cover the meaning or scope of content 
stats separately (i.e. in an aggregation section for manifests).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to