wypoon commented on issue #14735: URL: https://github.com/apache/iceberg/issues/14735#issuecomment-3899783323
@thomas-pfeiffer this is not a bug. The `compute_table_stats` failed because the snapshot that the stats were computed for is no longer the current snapshot, as reflected by the fact that the base metadata file (which was the table's metadata file when the procedure started) is not the same as the current metadata file. In general, an Iceberg write does not recompute (rewrite) data that is written, when another write has committed a change since it started; it checks to see if the data that is written is still compatible with the new state of the table and writes new metadata (does not write new data) if it is possible to reconcile what has been written with the new state (this is called rebasing on the new commit), and tries to commit again. This is what is meant by commit retry; it is not retrying the whole operation. For `compute_table_stats`, we are computing the NDV stats for the current snapshot (since you do not specify the snapshot). If the current snapshot has changed, then what is computed is no longer applicable, and that is why the commit fails. In a Spark scan, we check to see if there are stats for the snapshot we're reading, and load the stats for the snapshot if so. Thus, if we're not doing time travel but reading the current snapshot, there is no benefit to having table stats for older snapshots. Otherwise, I could see the potential benefit of allowing `compute_table_stats` to commit the statistics for the no-longer-current snapshot. Finally, for compaction operations such as `rewrite_data_files` and `rewrite_position_delete_files`, there is benefit to partial progess (e.g., some but not all partitions of a table being successfully compacted), hence the provision of a flag for enabling it. For `compute_table_stats`, this is not applicable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
