rdblue commented on PR #14502:
URL: https://github.com/apache/iceberg/pull/14502#issuecomment-3774072907

   > I am not sure we have a clear way to ban such writers
   
   I would have the REST catalog service remove any `partition.` keys from 
snapshot metadata.
   
   > there can be cases such as a table was un-protected when the snapshot was 
added which contained partition stats
   
   I agree, but snapshots usually don't last very long (days, unless it is the 
current version). So I'd expect that transition to enforcing no partition stats 
in snapshot summaries on the server side would fix this fairly quickly. You 
could also check for this when loading a table for a client.
   
   If you detect this, you can also remove the stats on the server side. That 
may look similar to what is here, but I think that it is better to have limited 
code in the service for this rather than introduce something in the library. We 
don't want people using this to edit old snapshots, which are effectively 
immutable right now. Introducing this would change that guarantee.
   
   > hard to enforce across catalog
   
   I think we'd need to define the use case a bit more clearly, but my initial 
take is that it's up to the catalog. If the primary catalog (the source of 
truth) drops the partition stats, then other catalogs should receive snapshots 
without them. And if the primary doesn't drop stats, then it is up to the 
receiving catalog how it chooses to handle that case. If it needs to drop stats 
for its own security model then I don't see how that would be a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to