Hello, We're increasing our use of YARN preemption on our Hadoop clusters, and we've noticed a significant uptick in orphaned data (ie. data that isn't associated with an Iceberg table but was written from Spark executors in the same app). We suspect it could be due to partially written but uncommitted data which doesn't get propagated to the driver before the container is preempted.
We'll continue to investigate on our side but I wanted to confirm what the best option is here, or whether that's expected behavior at all. Should we be using magic comitters in S3A to stage data with multipart before the commit? Or is this handled by some other Iceberg construct? We're on a slight fork of Iceberg 0.12, Spark 2.4/3.1. Thanks in advance! Jon
