+1 for removing this feature to keep things simple and predictable. Just for context, the relevant PRs: Original PR: https://github.com/apache/iceberg/pull/14501 Revert: https://github.com/apache/iceberg/pull/15386
-Max On Tue, Feb 24, 2026 at 6:08 PM Ryan Blue <[email protected]> wrote: > > Iceberg generates a unique filename for everything, so it should be true that > you can recover in-place. I think there's a prototype recovery > action/procedure out there that does this. The problem is that it is specific > to a FileIO implementation like S3 and wouldn't help with HDFS. The S3 data > platforms I've worked on have all used S3 versioning as a core component (it > can also protect against malicious changes) but it doesn't generalize. I > think these things are the responsibility of object storage. > > On Tue, Feb 24, 2026 at 8:34 AM Steve Loughran <[email protected]> wrote: >> >> Amazon classic S3 has versioning. >> >> One thing which could be considered would be to take the manifest of a >> specific version of a table, enumerate the s3 version of it and all the >> files it then references, to producea a list of the version IDs of the files >> which a version of a table referenced. >> >> Given that information, the next bit of work is to regenerate a (different) >> table from that data. >> >> You would not want to restore each object (that'd break anything referring >> to a later version), but to read that underlying version and write it >> elsewhere. Actually, if that no filename was ever recycled, the old versions >> could be restored and a recovered table set up with the old files as >> it...it'd be a lot simpler. Is that 100% true, always? >> >> >> >> On Mon, 23 Feb 2026 at 22:18, Ryan Blue <[email protected]> wrote: >>> >>> I merged the PR to revert this since I don't think anyone is strongly for >>> keeping it. I also think Steve is right that if we have NN pressure we >>> would want to use a bulk endpoint and that it won't be better to use >>> renames. >>> >>> The original author also confirmed on the PR that they can use a custom >>> FileIO for this and don't need it to be in Iceberg. That use case was >>> around having a way to undo bad orphan file cleanup, which would delete >>> files underneath the table. I don't think that's really an Iceberg >>> responsibility, again because if it were it would be built into the Hadoop >>> FileSystem rather than the FileIO layer above. >>> >>> It would also be a good idea to think about how to alternatively address >>> those cases. I think replicas are a good way to address it that are going >>> to be easier to produce in v4 (with relative paths) but other ideas are >>> definitely welcome here! >>> >>> On Mon, Feb 23, 2026 at 3:07 AM Steve Loughran <[email protected]> wrote: >>>> >>>> >>>> On Sat, 21 Feb 2026 at 09:02, Cheng Pan <[email protected]> wrote: >>>>> >>>>> Share a use case of HDFS Trash - deleting a directory on HDFS that has >>>>> tons of files might cause significant pressure on the NameNode and >>>>> slow the HDFS cluster for dozens of minutes, while moving to Trash is >>>>> relatively cheap, then those files can be deleted in the background >>>>> after reaching expiration time, in small batches, thus no pressure and >>>>> latency on the NameNode. >>>>> >>>> >>>> iceberg is only to be deleting files though, not directories; it''ll be >>>> acquiring a lock per file for a delete, and for a rename needs to get a >>>> lock of ~.Trash too. I don't see it being any worse here. >>>> >>>> Now, if you were to add bulk delete support to hdfs, we could send a >>>> single RPC there with a batch of files and hdfs could go through them and >>>> delete in turn, failing if a dir was encountered.And like the s3a >>>> implementation, it could be throttled: you'd implement that on the server >>>> before actually acquiring any locks so all callers of bulk delete would be >>>> constrained >>>> >>>> >>>>> >>>>> If possible, I would still like Iceberg to have this feature. >>>>> >>>>> Thanks, >>>>> Cheng Pan >>>>> >>>>> On Fri, Feb 20, 2026 at 3:22 AM Daniel Weeks <[email protected]> wrote: >>>>> > >>>>> > I agree with Steve and Ryan on this. >>>>> > >>>>> > I was a bit critical of all the issues with configuration and behavior >>>>> > when reviewing the PR, but felt that containing it to HDFS might make >>>>> > it reasonable to close the gap in behavior between Hive tables and >>>>> > Iceberg. >>>>> > >>>>> > However, it is complicated, messy and could cause surprising behavior >>>>> > for anyone who has it turned on in their environment when it suddenly >>>>> > starts being respected causing lots of trash behavior. >>>>> > >>>>> > I'll open a PR to revert and reach out to the original author. >>>>> > >>>>> > -Dan >>>>> > >>>>> > On Thu, Feb 19, 2026 at 11:14 AM Steve Loughran <[email protected]> >>>>> > wrote: >>>>> >> >>>>> >> >>>>> >> I'm very happy with removing support; it just complicates the code for >>>>> >> a failure condition "accidental deletion" which shouldn't surface. >>>>> >> >>>>> >> The only times where the users may want to roll back a delete is DROP >>>>> >> TABLE, and there it's the homework of the catalog to give users a way >>>>> >> to revert it. >>>>> >> >>>>> >> It's not shipped yet so removal is not a regression at all. >>>>> >> >>>>> >> steve >>>>> >> >>>>> >> >>>>> >> On Wed, 18 Feb 2026 at 22:48, Ryan Blue <[email protected]> wrote: >>>>> >>> >>>>> >>> During the Iceberg sync this morning, Steve suggested a PR to fix a >>>>> >>> problem with HadoopFileIO, #15111. I looked into this a bit more and >>>>> >>> it is based on #14501, which implements a Hadoop scheme where delete >>>>> >>> may actually move a file to a configured trash directory rather than >>>>> >>> deleting it. I think that this trash behavior is strange and doesn't >>>>> >>> fit into FileIO. I think the right thing to do is to probably remove >>>>> >>> it but I want to see what arguments for the behavior there are. >>>>> >>> >>>>> >>> In my opinion, the trash behavior is confusing and not obvious for >>>>> >>> the FileIO interface. The behavior, as I understand it, is to check >>>>> >>> whether a file should actually be deleted or should just be moved to >>>>> >>> a trash folder. Interestingly, this is not done underneath the Hadoop >>>>> >>> FileSystem interface, but is a client responsibility. Since FileIO is >>>>> >>> similar to FileSystem, I think there's a strong argument that it >>>>> >>> isn't appropriate within FileIO either. But there's another argument >>>>> >>> for not having this behavior, which is that table changes and >>>>> >>> user-driven file changes are not the same. Table can churn files >>>>> >>> quite a bit and deletes shouldn't move uncommitted files to trash -- >>>>> >>> they don't need to be recovered -- nor should they move replaced or >>>>> >>> deleted data files to a trash folder that could be in a user's home >>>>> >>> directory -- this is a big and not obvious behavior change. This >>>>> >>> seems to be in conflict with reasonable governance schemes because it >>>>> >>> could leak sensitive data. >>>>> >>> >>>>> >>> Next, the use case for a trash folder is to recover from accidental >>>>> >>> deletes by users. This is unnecessary in Iceberg because tables keep >>>>> >>> their own history. Accidental data operations are easily rolled back >>>>> >>> and we have a configurable history in which you can do it. This is >>>>> >>> also already integrated cleanly so that temporary metadata files that >>>>> >>> end up not being committed are not held. >>>>> >>> >>>>> >>> In the end, I think that we don't need this because history is >>>>> >>> already kept in a better way for tables, and this feature is >>>>> >>> confusing and doesn't fit in the API. What are the use cases for >>>>> >>> keeping this? >>>>> >>> >>>>> >>> Ryan
