wypoon commented on issue #14735:
URL: https://github.com/apache/iceberg/issues/14735#issuecomment-3899783323

   @thomas-pfeiffer this is not a bug. The `compute_table_stats` failed because 
the snapshot that the stats were computed for is no longer the current 
snapshot, as reflected by the fact that the base metadata file (which was the 
table's metadata file when the procedure started) is not the same as the 
current metadata file.
   In general, an Iceberg write does not recompute (rewrite) data that is 
written, when another write has committed a change since it started; it checks 
to see if the data that is written is still compatible with the new state of 
the table and writes new metadata (does not write new data) if it is possible 
to reconcile what has been written with the new state (this is called rebasing 
on the new commit), and tries to commit again. This is what is meant by commit 
retry; it is not retrying the whole operation.
   For `compute_table_stats`, we are computing the NDV stats for the current 
snapshot (since you do not specify the snapshot). If the current snapshot has 
changed, then what is computed is no longer applicable, and that is why the 
commit fails. In a Spark scan, we check to see if there are stats for the 
snapshot we're reading, and load the stats for the snapshot if so. Thus, if 
we're not doing time travel but reading the current snapshot, there is no 
benefit to having table stats for older snapshots. Otherwise, I could see the 
potential benefit of allowing `compute_table_stats` to commit the statistics 
for the no-longer-current snapshot.
   Finally, for compaction operations such as `rewrite_data_files` and 
`rewrite_position_delete_files`, there is benefit to partial progess (e.g., 
some but not all partitions of a table being successfully compacted), hence the 
provision of a flag for enabling it. For `compute_table_stats`, this is not 
applicable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to