singhpk234 commented on PR #14502:
URL: https://github.com/apache/iceberg/pull/14502#issuecomment-3771588110

   Thank you for the feedbacks @rdblue  !
   
   > the right solution is to stop embedding partition information in the 
snapshot summary and instead capture that data (if it is needed) using the 
metrics reporting framework and REST endpoint
   
   Agree, i think its an anti-pattern here were we leak stuff specially there 
are multiple ways to achieve the same. I am not sure we have a clear way to ban 
such writers, may be the end user made a dashboard on top of it because its 
convenient for them ?
   
   >  I'd recommend solving that problem more directly with something like a 
catalog override that suppresses them.
   
   IIUC, there can be cases such as a table was un-protected when the snapshot 
was added which contained partition stats, but now it is protected (we can 
enforce always to not add partition summary irrespective if the table is 
protected or not), may be this is a check then we would need to do as part of 
policy (RAP) attachment to make the attachment fail, but i think policy is 
sometimes attached via TAGs, may be failure at runtime then that "hey this 
table is protected but it has sensitive info which catalog can't hide", we 
throw 403 and prompt user to fix it. would expiring the snapshot be only 
solution then ? or we expect the user to rewrite the `metadata.json` without 
such summary and then do a force register ?
   
   > Or just drop them at the catalog level when processing AddSnapshot changes.
   
   My understanding is unless we spec this out, it would hard to enforce across 
catalog, for example the cases of federation where one defines a policy on a 
federated table (catalog C1 federating to catalog C2) in  will run into cases 
where AddSnapshot in C2 didn't enforce this and hence the table can't be 
queried now and we fail at runtime when queried from C1 since the policies are 
defined here.
   
   Hence i thought having something like metadata projection would give some 
flexibility to the catalogs to properly redact info (since snapshot summary is 
optional) without burdening the end-user.
   
   Please let me know your thoughts considering above.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to