ggershinsky commented on code in PR #4945:
URL: https://github.com/apache/iceberg/pull/4945#discussion_r928556124
##########
format/spec.md:
##########
@@ -631,6 +632,30 @@ When expiring snapshots, retention policies in table and
snapshot references are
2. The snapshot is not one of the first `min-snapshots-to-keep` in the
branch (including the branch's referenced snapshot)
5. Expire any snapshot not in the set of snapshots to retain.
+#### Statistics file
+
+Statistics files are valid [Puffin files](../puffin-spec). Statistics are
informational. A reader can choose to
+ignore statistics information. Statistics support is not required to read the
table correctly.
+
+Statistics file's metadata within `statistics` table snapshot field is a
struct with the following fields:
+
+| v1 | v2 | Field name | Type
| Description
|
+|------------|------------|---------------------------------|-----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`statistics-path`** | `string`
| Path of the statistics file. See [Puffin file
format](../puffin-spec).
|
+| _required_ | _required_ | **`file-size-in-bytes`** | `long`
| Size of the statistics file.
|
Review Comment:
> add key_metadata | Base64-encoded implementation-specific key metadata for
encryption here.
sgtm
> do we want whole-file encryption, or do we need to encrypt individual
blobs?
Both key_metadata and the encryption itself are implementation-specific, so
probably no need to add these details here. (wrt implementations, we can start
with a simple file encrypting stream, similar to manifest files; and expand to
other techniques later)
> if a file is encrypted, is it more useful to store file-size-in-bytes
before encryption, or we also/instead need to have a number after the encryption
this depends on the intended use of this field. For manifest files, we don't
use the file-size-in-bytes field for decryption - we get the full file length
directly from InputFile.getLength() which is sourced from the file system. In
other words, decryption is ok with any definition of this field (because it
doesn't use it).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]