rdblue commented on PR #14502: URL: https://github.com/apache/iceberg/pull/14502#issuecomment-3774072907
> I am not sure we have a clear way to ban such writers I would have the REST catalog service remove any `partition.` keys from snapshot metadata. > there can be cases such as a table was un-protected when the snapshot was added which contained partition stats I agree, but snapshots usually don't last very long (days, unless it is the current version). So I'd expect that transition to enforcing no partition stats in snapshot summaries on the server side would fix this fairly quickly. You could also check for this when loading a table for a client. If you detect this, you can also remove the stats on the server side. That may look similar to what is here, but I think that it is better to have limited code in the service for this rather than introduce something in the library. We don't want people using this to edit old snapshots, which are effectively immutable right now. Introducing this would change that guarantee. > hard to enforce across catalog I think we'd need to define the use case a bit more clearly, but my initial take is that it's up to the catalog. If the primary catalog (the source of truth) drops the partition stats, then other catalogs should receive snapshots without them. And if the primary doesn't drop stats, then it is up to the receiving catalog how it chooses to handle that case. If it needs to drop stats for its own security model then I don't see how that would be a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
