cwsteinbach commented on a change in pull request #3037:
URL: https://github.com/apache/iceberg/pull/3037#discussion_r697064622



##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to 
manage a large, slow-changing collection of files in a distributed file system 
or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the 
community.
+
+The format version number is incremented when new features are added that will 
break forward-compatibility---that is, when older readers would not read newer 
table features correctly. Tables may continue to be written with an older 
version of the spec to ensure compatibility by not using features that are not 
yet implemented by processing engines.

Review comment:
       I recommend replacing the second sentence with one that explicitly 
mentions "backward compatibility", e.g. `Iceberg guarantees backward 
compatibility across all versions, so a newer reader will always be able to 
read a table written by an older writer.`

##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to 
manage a large, slow-changing collection of files in a distributed file system 
or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the 
community.

Review comment:
       I'm not sure what "finished" and "supported by the community" actually 
means, e.g., which community? what is finished (the spec? all of the reader 
libraries? integrations with all supported engines?)
   
   Please consider replacing instances of "Iceberg format" with "Iceberg format 
specification".
   
   Also, including dates when each format version was released would be nice.

##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to 
manage a large, slow-changing collection of files in a distributed file system 
or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the 
community.
+
+The format version number is incremented when new features are added that will 
break forward-compatibility---that is, when older readers would not read newer 
table features correctly. Tables may continue to be written with an older 
version of the spec to ensure compatibility by not using features that are not 
yet implemented by processing engines.

Review comment:
       On a side note, we really need to add a "version compatibility" page to 
do the docs that tracks which versions of compute engines work with which 
versions of the Iceberg spec.

##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to 
manage a large, slow-changing collection of files in a distributed file system 
or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the 
community.
+
+The format version number is incremented when new features are added that will 
break forward-compatibility---that is, when older readers would not read newer 
table features correctly. Tables may continue to be written with an older 
version of the spec to ensure compatibility by not using features that are not 
yet implemented by processing engines.
+
 #### Version 1: Analytic Data Tables
 
-**Iceberg format version 1 is the current version**. It defines how to manage 
large analytic tables using immutable file formats: Parquet, Avro, and ORC.
+Iceberg format version 1 defines how to manage large analytic tables using 
immutable file formats: Parquet, Avro, and ORC.
 
 #### Version 2: Row-level Deletes
 
-The Iceberg community is currently working on version 2 of the Iceberg format 
that supports encoding row-level deletes. **The v2 specification is incomplete 
and may change until it is finished and adopted.** This document includes 
tentative v2 format requirements, but there are currently no compatibility 
guarantees with the unfinished v2 spec.
+Iceberg format version 2 adds row-level deletes for analytic tables with 
immutable files.

Review comment:
       While I understand that the mechanism for providing row-level updates 
depends on row-level "delete files", I think it will cause less confusion to 
say that Version 2 supports "row-level updates and deletes".

##########
File path: site/docs/spec.md
##########
@@ -1002,7 +1010,7 @@ This serialization scheme is for storing single values as 
individual binary valu
 | **`map`**                    | Not supported                                 
                                                               |
 
 
-## Format version changes
+## Appendix D: Format version changes
 
 ### Version 2
 

Review comment:
       I think it would help readers to start this section with a list of 
changes to metadata fields (fields that were added, existing optional fields 
that are now mandatory, etc).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to