mderoy opened a new pull request, #16668:
URL: https://github.com/apache/iceberg/pull/16668

   Today you are unable to apply a global equality delete to a table without 
having an unpartitioned partition spec (as spec 0 in some cases) in specsById. 
This is super misleading because you are able to create the equality delete by 
passing PartitionSpec.unpartitioned(), but due to a collision with the specId, 
it is not properly registered as a global equality delete
   
   ```
   PartitionSpec.unpartitioned() is a singleton constant with specId = 0. A 
table's first partition spec is also assigned specId = 0. When a global 
equality delete is written using PartitionSpec.unpartitioned() against a table 
whose first(and only) spec is partitioned, the delete file stores specId = 0. 
At read time, specsById.get(0) resolves to the table's partitioned spec rather 
than the unpartitioned constant, causing the delete to be misclassified as 
partition-scoped. It is indexed under an empty partition key in 
eqDeletesByPartition rather than globalDeletes. Since no data file's partition 
matches an empty key, the delete never applies and deleted rows are incorrectly 
returned to the reader.
   ```
   
   Note, I'm compelled to think there is a better fix...but I don't think we 
can reserve the 0 specId for unpartitioned since tables already exist that 
break that convention. We could maybe use -1 but it would not fix the issue for 
existing tables. This fix just treats empty partition data as global.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to