kbendick commented on a change in pull request #2101:
URL: https://github.com/apache/iceberg/pull/2101#discussion_r559259799
##########
File path: site/docs/evolution.md
##########
@@ -62,3 +62,31 @@ When you evolve a partition spec, the old data written with
an earlier spec rema
Iceberg uses [hidden partitioning](./partitioning.md), so you don't *need* to
write queries for a specific partition layout to be fast. Instead, you can
write queries that select the data you need, and Iceberg automatically prunes
out files that don't contain matching data.
Partition evolution is a metadata operation and does not eagerly rewrite files.
+
+Iceberg's Java table API provides `updateSpec` API to update partition spec.
For example:
+
+```java
+sampleTable.updateSpec()
+ .addField(bucket("id", 8))
+ .renameField("category", "category")
+ .removeField("id_bucket_8", "shard")
Review comment:
I'm unable to find the `removeField` function with two string parameters
in the `UpdatePartitionSpec` interface. Are you sure you intended to call
`removeField` here and not possibly `renameField`?
##########
File path: site/docs/evolution.md
##########
@@ -62,3 +62,31 @@ When you evolve a partition spec, the old data written with
an earlier spec rema
Iceberg uses [hidden partitioning](./partitioning.md), so you don't *need* to
write queries for a specific partition layout to be fast. Instead, you can
write queries that select the data you need, and Iceberg automatically prunes
out files that don't contain matching data.
Partition evolution is a metadata operation and does not eagerly rewrite files.
+
+Iceberg's Java table API provides `updateSpec` API to update partition spec.
For example:
+
+```java
+sampleTable.updateSpec()
+ .addField(bucket("id", 8))
+ .renameField("category", "category")
Review comment:
What does this partition spec update do? Seems like a no-op for a field
rename (other than possibly reassigning IDs for this column? - just a guess on
that front).
I can appreciate that this is allowed, but unless this call does something
that I'm not aware of, I think that adding this to the documentation's example
is potentially more confusing than helpful to those who are learning.
Otherwise, like I mentioned elsewhere, it's probably good to write out what
this updateSpec call does as it's not immediately self evident - at least to
me, though that could be my own limitation and maybe it's clear to others.
##########
File path: site/docs/evolution.md
##########
@@ -62,3 +62,31 @@ When you evolve a partition spec, the old data written with
an earlier spec rema
Iceberg uses [hidden partitioning](./partitioning.md), so you don't *need* to
write queries for a specific partition layout to be fast. Instead, you can
write queries that select the data you need, and Iceberg automatically prunes
out files that don't contain matching data.
Partition evolution is a metadata operation and does not eagerly rewrite files.
+
+Iceberg's Java table API provides `updateSpec` API to update partition spec.
For example:
+
+```java
+sampleTable.updateSpec()
+ .addField(bucket("id", 8))
+ .renameField("category", "category")
+ .removeField("id_bucket_8", "shard")
+ .commit();
+```
Review comment:
Something to consider:
You might consider stating what this `updateSpec` code is going to do.
Something like `For example, the following code could be used to update the
partition spec to bucket `id` column into 8 buckets....`.
Additionally, I think it would be helpful to indicate whether or not this
changes the old data.
Your added `Sort order evolution` docs say `When you evolve a sort order,
the old data written with an earlier order remains unchanged.`, to me it begs
the question of whether or not updating the partition spec via `updateSpec`
will rewrite old data - and if it does not rewrite old data, what precautions
do we recommend to people who might use this?
##########
File path: site/docs/evolution.md
##########
@@ -62,3 +62,31 @@ When you evolve a partition spec, the old data written with
an earlier spec rema
Iceberg uses [hidden partitioning](./partitioning.md), so you don't *need* to
write queries for a specific partition layout to be fast. Instead, you can
write queries that select the data you need, and Iceberg automatically prunes
out files that don't contain matching data.
Partition evolution is a metadata operation and does not eagerly rewrite files.
+
+Iceberg's Java table API provides `updateSpec` API to update partition spec.
For example:
+
+```java
+sampleTable.updateSpec()
+ .addField(bucket("id", 8))
+ .renameField("category", "category")
+ .removeField("id_bucket_8", "shard")
+ .commit();
+```
+
+Spark supports updating partition spec through its `ALTER TABLE` SQL
statement, see more details in [Spark
SQL](../spark/#alter-table-add-partition-field).
+
+## Sort order evolution
+
+Similar to partition spec, Iceberg sort order can also be updated in an
existing table.
+When you evolve a sort order, the old data written with an earlier order
remains unchanged.
+Engines can always choose to write data in the latest sort order or unsorted
when sorting is prohibitively expensive.
Review comment:
When a table has a sort order spec, but the older data is not sorted
according to the spec, can this cause queries to silently return incorrect
data? Or is this not an issue given that engines can already choose to write
data sorted or not based on how expensive it's deemed to be.
Possibly this is more elucidated elsewhere, but otherwise I think it would
be good to clarify if changes to the sort order can cause incorrect query
results (e.g. if the query engine makes the assumption that data is sorted
during execution planning).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]