I'm very happy with removing support; it just complicates the code for a failure condition "accidental deletion" which shouldn't surface.
The only times where the users may want to roll back a delete is DROP TABLE, and there it's the homework of the catalog to give users a way to revert it. It's not shipped yet so removal is not a regression at all. steve On Wed, 18 Feb 2026 at 22:48, Ryan Blue <[email protected]> wrote: > During the Iceberg sync this morning, Steve suggested a PR to fix a > problem with HadoopFileIO, #15111. I looked into this a bit more and it is > based on #14501, which implements a Hadoop scheme where delete may actually > move a file to a configured trash directory rather than deleting it. I > think that this trash behavior is strange and doesn't fit into FileIO. I > think the right thing to do is to probably remove it but I want to see what > arguments for the behavior there are. > > In my opinion, the trash behavior is confusing and not obvious for the > FileIO interface. The behavior, as I understand it, is to check whether a > file should actually be deleted or should just be moved to a trash folder. > Interestingly, this is not done underneath the Hadoop FileSystem interface, > but is a client responsibility. Since FileIO is similar to FileSystem, I > think there's a strong argument that it isn't appropriate within FileIO > either. But there's another argument for not having this behavior, which is > that table changes and user-driven file changes are not the same. Table can > churn files quite a bit and deletes shouldn't move uncommitted files to > trash -- they don't need to be recovered -- nor should they move replaced > or deleted data files to a trash folder that could be in a user's home > directory -- this is a big and not obvious behavior change. This seems to > be in conflict with reasonable governance schemes because it could leak > sensitive data. > > Next, the use case for a trash folder is to recover from accidental > deletes by users. This is unnecessary in Iceberg because tables keep their > own history. Accidental data operations are easily rolled back and we have > a configurable history in which you can do it. This is also already > integrated cleanly so that temporary metadata files that end up not being > committed are not held. > > In the end, I think that we don't need this because history is already > kept in a better way for tables, and this feature is confusing and doesn't > fit in the API. What are the use cases for keeping this? > > Ryan >
