markope opened a new issue #3063: URL: https://github.com/apache/iceberg/issues/3063
In your [documentation on partitioned writes](https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables) you mention that data needs to be ordered by the partition clause to avoid the "file already closed" error when writing to disk. Create a table like so: ```sql CREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg PARTITIONED BY (days(ts), category) ``` Then you need to insert into the table with spark sql with a `date_trunc("day", ts)` because if the `ts` column contains hour detail then this might skew up the category order when ordering by only `ts`. ```sql INSERT INTO prod.db.sample SELECT id, data, category, ts FROM another_table ORDER BY date_trunc("day", ts), category ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
