I would be hesitant to turn on any new feature by default. Especially for
Spark compaction which is widely used in production.

+1 for providing a way for the users to enable the feature manually

Gabor Kaszab <gaborkas...@apache.org> ezt írta (időpont: 2025. márc. 14.,
P, 12:19):

> Hi Iceberg Community,
>
> There were recent additions to RemoveSnapshots to expire the unused
> partition specs and schemas. This is controlled by a flag called
> 'cleanExpiredMetadata' and has a default value 'false'. Additionally,
> Spark
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L147>
> and Flink
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/ExpireSnapshotsProcessor.java#L86>
> don't offer a way to set this flag currently.
>
> 1) Default value of RemoveSnapshots.cleanExpiredMetadata
> I'm wondering if it's desired by the community to default this flag to
> true. The effect of that would be that each snapshot expiration would also
> clean up the unused partition specs and schemas too. This functionality is
> quite new so this might need some extra confidence by the community before
> turning on by default but I think it's worth a consideration.
>
> 2) Spark and Flink to support setting this flag
> I think it makes sense to add support in Spark's ExpireSnapshotProcedure
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ExpireSnapshotsProcedure.java#L116>
> and ExpireSnapshotsSparkAction
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L147>
> also to Flink's ExpireSnapshotsProcessor
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/ExpireSnapshotsProcessor.java#L58>
> and ExpireSnapshots
> <https://github.com/apache/iceberg/blob/c02ebe4740b22d6f5a78b636aea2d918037b2751/flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/ExpireSnapshots.java#L44>
> to allow setting this flag based on (user) inputs.
>
> WDYT?
>
> Regards,
> Gabor
>

Reply via email to