[GitHub] [iceberg] pvary commented on issue #5339: Adding the same file twice for the same table

GitBox Fri, 22 Jul 2022 22:22:45 -0700


pvary commented on issue #5339:
URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1193063396


   > Yea I think there was an old similar discussion here: #3064. I think we 
can do a per check of all files added in same transaction, but anything beyond 
that involves an expensive spark call to check for duplicates in the table 
itself?
   
   Thanks @szehon-ho, I was not aware of the old thread. It seems like a 
reasonable comprise to accept duplicated files, if we do not parse the whole 
table metadata anyway.
   What is the level of the data parsed when we have a `Table` object at hand? 
Which metadata files do we read when we commit something? Does anyone have a 
quick answer for this, or shall I check?
   
   Thanks everyone for the answers!
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on issue #5339: Adding the same file twice for the same table

Reply via email to