Hi all, Recently I played around the partitioned Iceberg table in Spark, and realized it requires manual sort. I had to google to find a workaround - I guess there's no documentation unless I'm missing something.
While I encountered this with a DataFrame writer, I suspect there would be more limitations as the root issue is missing SPARK-23889 [1]. My suspicion is that any writes would be affected, including CTAS-kind (like copying from another table in different partitioning), as there's no way to enforce the requirements based on partitioning in DSv2 writer. Do I understand this correctly? I feel we may need to spend efforts to push forward SPARK-23889 for Iceberg (or consider moving down to DSv1 writer), as I think the workaround is unacceptable for many end users. And probably need to document the impact and workaround till we fix the issue. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-23889