RussellSpitzer commented on issue #7657:
URL: https://github.com/apache/iceberg/issues/7657#issuecomment-1554757626

   You can just look at the plan details to see what "existingRDD" is being 
generated from. Although this should be in your code. Again, and I can't 
emphasize this enough, there is no way an "Insert" can read the data already in 
your table. Only an Update, Merge or Delete can do that.
   
   You really should share the actual code you are using (or sql) for insert 
because comparing things based on their approximate size is probably not going 
to get us very far.
   
   And again on the coincidence of rewrite_data_files, the operation can only 
do 1 thing to modify the table for future readers, and that is to change the 
metadata.json pointed to in the catalog. All Iceberg transactions function on 
this basic principal. If that pointer did not change, there is no way the 
command can influence future readers. The command may leave garbage files 
behind though if it failed, hence my thoughts about resource issues. Any of the 
other things I could imagine, like removing existing metadata or data files 
accidentally (which rewrite data files does not do but there are mysteries 
everywhere) would result in abrupt errors. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to