[GitHub] [iceberg] markope opened a new issue #3063: Documentation of partitioned write has a wrong order by expression in the example

GitBox Thu, 02 Sep 2021 09:45:39 -0700


markope opened a new issue #3063:
URL: https://github.com/apache/iceberg/issues/3063



   In your [documentation on partitioned 
writes](https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables) 
you mention that data needs to be ordered by the partition clause to avoid the 
"file already closed" error when writing to disk. 
   
   Create a table like so:
   
   ```sql
   CREATE TABLE prod.db.sample (
       id bigint,
       data string,
       category string,
       ts timestamp)
   USING iceberg
   PARTITIONED BY (days(ts), category)
   ```
   
   Then you need to insert into the table with spark sql with a 
`date_trunc("day", ts)` because if the `ts` column contains hour detail then 
this might skew up the category order when ordering by only `ts`.
   
   ```sql
   INSERT INTO prod.db.sample
   SELECT id, data, category, ts FROM another_table
   ORDER BY date_trunc("day", ts), category
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] markope opened a new issue #3063: Documentation of partitioned write has a wrong order by expression in the example

Reply via email to