Impact on Spark-Iceberg usage on missing to enforce clustering/sort requirement (SPARK-23889)

Jungtaek Lim Wed, 16 Sep 2020 16:27:31 -0700

Hi all,

Recently I played around the partitioned Iceberg table in Spark, and
realized it requires manual sort. I had to google to find a workaround - I
guess there's no documentation unless I'm missing something.


While I encountered this with a DataFrame writer, I suspect there would be
more limitations as the root issue is missing SPARK-23889 [1]. My suspicion
is that any writes would be affected, including CTAS-kind (like copying
from another table in different partitioning), as there's no way to enforce
the requirements based on partitioning in DSv2 writer.

Do I understand this correctly? I feel we may need to spend efforts to push
forward SPARK-23889 for Iceberg (or consider moving down to DSv1 writer),
as I think the workaround is unacceptable for many end users.

And probably need to document the impact and workaround till we fix the
issue.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-23889

Impact on Spark-Iceberg usage on missing to enforce clustering/sort requirement (SPARK-23889)

Reply via email to