gustavoatt commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1169999432

   @rdblue let me provide more context. What we are doing at Airbnb, is that we 
are trying to fully integrate Airflow partition sensing with Airflow tables. 
One problem we have encountered when doing this is keeping track of empty 
partitions added explicitly via the command:
   
   ```sql
   INSERT OVERWRITE $TBL
   PARTITION (parKey=partVal,...)
   SELECT *
   FROM $EMPTY_SELECTION
   ```
   
   Currently there is no way for us to know whether such an insert took place 
for sensing purposes. We have decided to keep track of these by adding custom 
properties on `Snapshot::summary` via a subclass of `BaseOvewriteFiles` that 
captures the `overwriteByRowFilter` passed to find out whether we were 
inserting a single partition.
   
   We want to do the same for `DeleteFiles` so that we know when an empty 
partition "gets deleted", if that makes sense. Unfortunately the default 
implementation of `DeleteFiles` is protected which means we cannot subclass it 
outside of the iceberg package.
   
   Note that we are hoping to keep track of empty partitions without having to 
do any changes to the way we write to Spark tables, so injecting our custom 
`OverwriteFiles` and `DeleteFiles` via our custom catalog is the most 
transparent way we have found to keep track of these expressions without having 
to change user jobs code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to