rdblue commented on a change in pull request #1523:
URL: https://github.com/apache/iceberg/pull/1523#discussion_r496257864
##########
File path: site/docs/spark.md
##########
@@ -519,6 +519,59 @@ data.writeTo("prod.db.table")
.createOrReplace()
```
+## Writing against partitioned table
+
+Iceberg requires the data to be sorted according to the partition spec in
prior to write against partitioned table.
+This applies both Writing with SQL and Writing with DataFrames.
+
+Assuming we would like to write the data against below sample table:
+
+```sql
+CREATE TABLE prod.db.sample (
+ id bigint,
+ data string,
+ category string,
+ ts timestamp)
+USING iceberg
+PARTITIONED BY (bucket(16, id), days(ts), category)
Review comment:
Should this example be a little simpler?
I think it makes sense to have an example for bucketing, but the other two
don't require creating a UDF with the Iceberg transform. Splitting this into
two examples might make sense: one with `days(ts)` and `category` to show how
to add the sort with a SQL `ORDER BY` and using `sortWithinPartitions`, and
then a more complicated one for bucketing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]