RussellSpitzer commented on issue #7657: URL: https://github.com/apache/iceberg/issues/7657#issuecomment-1554757626
You can just look at the plan details to see what "existingRDD" is being generated from. Although this should be in your code. Again, and I can't emphasize this enough, there is no way an "Insert" can read the data already in your table. Only an Update, Merge or Delete can do that. You really should share the actual code you are using (or sql) for insert because comparing things based on their approximate size is probably not going to get us very far. And again on the coincidence of rewrite_data_files, the operation can only do 1 thing to modify the table for future readers, and that is to change the metadata.json pointed to in the catalog. All Iceberg transactions function on this basic principal. If that pointer did not change, there is no way the command can influence future readers. The command may leave garbage files behind though if it failed, hence my thoughts about resource issues. Any of the other things I could imagine, like removing existing metadata or data files accidentally (which rewrite data files does not do but there are mysteries everywhere) would result in abrupt errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
