[GitHub] [iceberg] HeartSaVioR commented on a change in pull request #1523: ISSUE-1520 Document writing against partitioned table in Spark

GitBox Mon, 28 Sep 2020 19:21:48 -0700


HeartSaVioR commented on a change in pull request #1523:
URL: https://github.com/apache/iceberg/pull/1523#discussion_r496339735




##########
File path: site/docs/spark.md
##########
@@ -519,6 +519,59 @@ data.writeTo("prod.db.table")
     .createOrReplace()
 ```
 
+## Writing against partitioned table
+
+Iceberg requires the data to be sorted according to the partition spec in 
prior to write against partitioned table.

Review comment:
       Please correct me if I'm missing something. (Sorry for being too correct 
technically - I'm also a learner of Iceberg so just to understand correctly.)
   
   If I understand correctly, at least "Iceberg Spark writer" requires the data 
to be sorted according to the partition spec in task (Spark partition), not 
just the data to be clustered by partition.
   
   
   Below query fails:
   
   ```
   spark.sql("""
   CREATE TABLE iceberg_catalog.default.sample1 (
       id bigint,
       data string,
       category string)
   USING iceberg
   PARTITIONED BY (category)
   """)
   
   val data = (0 to 100000).map { id =>
     (id, s"hello$id", s"category-${id % 100}")
   }
   
   data.toDF("id", "data", "category").repartition(100, 
col("category")).sortWithinPartitions("id").writeTo("iceberg_catalog.default.sample1").append()
   ```
   
   Mentioning the global and local sorts would be nice to have. Thanks! Will 
add.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] HeartSaVioR commented on a change in pull request #1523: ISSUE-1520 Document writing against partitioned table in Spark

Reply via email to