rdblue commented on pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370#issuecomment-806070509


   Thanks for looking into this, @wynjauu. It's always good to check places 
that may benefit from a refactor. Here, I don't think that this refactor is a 
good idea. There are good reasons why the code is structured this way.
   
   First, the call to `encryption.decrypt` passes all of the encrypted input 
files at the same time by passing a single iterator. That allows an encryption 
manager to do a single batch call to fetch the keys for those files, which is 
much more important than avoiding extra loops.
   
   Second, the same delete file may be used for more than one data file. In the 
rewrite, `decrypt` would be called once for each copy of the same file rather 
than just once, and a duplicate would result in an exception from the immutable 
map builder. That's why we use a mutable map of location to key metadata and 
then create encrypted input files from the entries.
   
   While this does more loops, it is actually more performant and correct to do 
it this way. It is much more important to deduplicate decryption and to use the 
batch decrypt call than to avoid an extra loop or two over a small collection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to