gustavoatt commented on PR #5148: URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1169999432
@rdblue let me provide more context. What we are doing at Airbnb, is that we are trying to fully integrate Airflow partition sensing with Airflow tables. One problem we have encountered when doing this is keeping track of empty partitions added explicitly via the command: ```sql INSERT OVERWRITE $TBL PARTITION (parKey=partVal,...) SELECT * FROM $EMPTY_SELECTION ``` Currently there is no way for us to know whether such an insert took place for sensing purposes. We have decided to keep track of these by adding custom properties on `Snapshot::summary` via a subclass of `BaseOvewriteFiles` that captures the `overwriteByRowFilter` passed to find out whether we were inserting a single partition. We want to do the same for `DeleteFiles` so that we know when an empty partition "gets deleted", if that makes sense. Unfortunately the default implementation of `DeleteFiles` is protected which means we cannot subclass it outside of the iceberg package. Note that we are hoping to keep track of empty partitions without having to do any changes to the way we write to Spark tables, so injecting our custom `OverwriteFiles` and `DeleteFiles` via our custom catalog is the most transparent way we have found to keep track of these expressions without having to change user jobs code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
