cwsteinbach commented on a change in pull request #3037:
URL: https://github.com/apache/iceberg/pull/3037#discussion_r697064622
##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
This is a specification for the Iceberg table format that is designed to
manage a large, slow-changing collection of files in a distributed file system
or key-value store as a table.
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the
community.
+
+The format version number is incremented when new features are added that will
break forward-compatibility---that is, when older readers would not read newer
table features correctly. Tables may continue to be written with an older
version of the spec to ensure compatibility by not using features that are not
yet implemented by processing engines.
Review comment:
I recommend replacing the second sentence with one that explicitly
mentions "backward compatibility", e.g. `Iceberg guarantees backward
compatibility across all versions, so a newer reader will always be able to
read a table written by an older writer.`
##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
This is a specification for the Iceberg table format that is designed to
manage a large, slow-changing collection of files in a distributed file system
or key-value store as a table.
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the
community.
Review comment:
I'm not sure what "finished" and "supported by the community" actually
means, e.g., which community? what is finished (the spec? all of the reader
libraries? integrations with all supported engines?)
Please consider replacing instances of "Iceberg format" with "Iceberg format
specification".
Also, including dates when each format version was released would be nice.
##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
This is a specification for the Iceberg table format that is designed to
manage a large, slow-changing collection of files in a distributed file system
or key-value store as a table.
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the
community.
+
+The format version number is incremented when new features are added that will
break forward-compatibility---that is, when older readers would not read newer
table features correctly. Tables may continue to be written with an older
version of the spec to ensure compatibility by not using features that are not
yet implemented by processing engines.
Review comment:
On a side note, we really need to add a "version compatibility" page to
do the docs that tracks which versions of compute engines work with which
versions of the Iceberg spec.
##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
This is a specification for the Iceberg table format that is designed to
manage a large, slow-changing collection of files in a distributed file system
or key-value store as a table.
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the
community.
+
+The format version number is incremented when new features are added that will
break forward-compatibility---that is, when older readers would not read newer
table features correctly. Tables may continue to be written with an older
version of the spec to ensure compatibility by not using features that are not
yet implemented by processing engines.
+
#### Version 1: Analytic Data Tables
-**Iceberg format version 1 is the current version**. It defines how to manage
large analytic tables using immutable file formats: Parquet, Avro, and ORC.
+Iceberg format version 1 defines how to manage large analytic tables using
immutable file formats: Parquet, Avro, and ORC.
#### Version 2: Row-level Deletes
-The Iceberg community is currently working on version 2 of the Iceberg format
that supports encoding row-level deletes. **The v2 specification is incomplete
and may change until it is finished and adopted.** This document includes
tentative v2 format requirements, but there are currently no compatibility
guarantees with the unfinished v2 spec.
+Iceberg format version 2 adds row-level deletes for analytic tables with
immutable files.
Review comment:
While I understand that the mechanism for providing row-level updates
depends on row-level "delete files", I think it will cause less confusion to
say that Version 2 supports "row-level updates and deletes".
##########
File path: site/docs/spec.md
##########
@@ -1002,7 +1010,7 @@ This serialization scheme is for storing single values as
individual binary valu
| **`map`** | Not supported
|
-## Format version changes
+## Appendix D: Format version changes
### Version 2
Review comment:
I think it would help readers to start this section with a list of
changes to metadata fields (fields that were added, existing optional fields
that are now mandatory, etc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]