szehon-ho commented on pull request #3844:
URL: https://github.com/apache/iceberg/pull/3844#issuecomment-1069766220
I'm a bit conflicted, it makes sense to have fast truncate, but I think the
presence of DELETED entries in manifest is also used in other places to check
whether data has been deleted, like example:
1. checking serializable-isolation of concurrent operations (must fail if
data they use is deleted)
2. CDC design (to mark row as deleted row)
If we truncate this way from Spark/Flink then any system using those wont
work, is it a concern? Or is it more like a drop -table operation where we
dont care anymore about the table. cc @aokolnychyi
The other thought is that we can achieve the same by doing
DeleteFiles.deleteFromRowFilter(Expressions.alwaysTrue()), it is a bit slower
in having to read each manifest file, but still faster than having to read data
files, not sure what others think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]