[GitHub] [iceberg] rdblue commented on a change in pull request #1523: ISSUE-1520 Document writing against partitioned table in Spark

GitBox Mon, 28 Sep 2020 14:54:31 -0700


rdblue commented on a change in pull request #1523:
URL: https://github.com/apache/iceberg/pull/1523#discussion_r496257864




##########
File path: site/docs/spark.md
##########
@@ -519,6 +519,59 @@ data.writeTo("prod.db.table")
     .createOrReplace()
 ```
 
+## Writing against partitioned table
+
+Iceberg requires the data to be sorted according to the partition spec in 
prior to write against partitioned table.
+This applies both Writing with SQL and Writing with DataFrames.
+
+Assuming we would like to write the data against below sample table:
+
+```sql
+CREATE TABLE prod.db.sample (
+    id bigint,
+    data string,
+    category string,
+    ts timestamp)
+USING iceberg
+PARTITIONED BY (bucket(16, id), days(ts), category)

Review comment:
       Should this example be a little simpler?
   
   I think it makes sense to have an example for bucketing, but the other two 
don't require creating a UDF with the Iceberg transform. Splitting this into 
two examples might make sense: one with `days(ts)` and `category` to show how 
to add the sort with a SQL `ORDER BY` and using `sortWithinPartitions`, and 
then a more complicated one for bucketing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1523: ISSUE-1520 Document writing against partitioned table in Spark

Reply via email to