Re: [DISCUSS] Remove HDFS trash behavior?

Maximilian Michels Wed, 25 Feb 2026 07:11:14 -0800

+1 for removing this feature to keep things simple and predictable.

Just for context, the relevant PRs:
Original PR: https://github.com/apache/iceberg/pull/14501
Revert: https://github.com/apache/iceberg/pull/15386


-Max

On Tue, Feb 24, 2026 at 6:08 PM Ryan Blue <[email protected]> wrote:
>
> Iceberg generates a unique filename for everything, so it should be true that 
> you can recover in-place. I think there's a prototype recovery 
> action/procedure out there that does this. The problem is that it is specific 
> to a FileIO implementation like S3 and wouldn't help with HDFS. The S3 data 
> platforms I've worked on have all used S3 versioning as a core component (it 
> can also protect against malicious changes) but it doesn't generalize. I 
> think these things are the responsibility of object storage.
>
> On Tue, Feb 24, 2026 at 8:34 AM Steve Loughran <[email protected]> wrote:
>>
>> Amazon classic S3 has versioning.
>>
>> One thing which could be considered would be to take the manifest of a 
>> specific version of a table, enumerate the s3 version of it and all the 
>> files it then references, to producea a list of the version IDs of the files 
>> which a version of a table referenced.
>>
>> Given that information, the next bit of work is to regenerate a (different) 
>> table from that data.
>>
>> You would not want to restore each object (that'd break anything referring 
>> to a later version), but to read that underlying version and write it 
>> elsewhere. Actually, if that no filename was ever recycled, the old versions 
>> could be restored and a recovered table set up with the old files as 
>> it...it'd be a lot simpler. Is that 100% true, always?
>>
>>
>>
>> On Mon, 23 Feb 2026 at 22:18, Ryan Blue <[email protected]> wrote:
>>>
>>> I merged the PR to revert this since I don't think anyone is strongly for 
>>> keeping it. I also think Steve is right that if we have NN pressure we 
>>> would want to use a bulk endpoint and that it won't be better to use 
>>> renames.
>>>
>>> The original author also confirmed on the PR that they can use a custom 
>>> FileIO for this and don't need it to be in Iceberg. That use case was 
>>> around having a way to undo bad orphan file cleanup, which would delete 
>>> files underneath the table. I don't think that's really an Iceberg 
>>> responsibility, again because if it were it would be built into the Hadoop 
>>> FileSystem rather than the FileIO layer above.
>>>
>>> It would also be a good idea to think about how to alternatively address 
>>> those cases. I think replicas are a good way to address it that are going 
>>> to be easier to produce in v4 (with relative paths) but other ideas are 
>>> definitely welcome here!
>>>
>>> On Mon, Feb 23, 2026 at 3:07 AM Steve Loughran <[email protected]> wrote:
>>>>
>>>>
>>>> On Sat, 21 Feb 2026 at 09:02, Cheng Pan <[email protected]> wrote:
>>>>>
>>>>> Share a use case of HDFS Trash - deleting a directory on HDFS that has
>>>>> tons of files might cause significant pressure on the NameNode and
>>>>> slow the HDFS cluster for dozens of minutes, while moving to Trash is
>>>>> relatively cheap, then those files can be deleted in the background
>>>>> after reaching expiration time, in small batches, thus no pressure and
>>>>> latency on the NameNode.
>>>>>
>>>>
>>>> iceberg is only to be deleting files though, not directories; it''ll be 
>>>> acquiring a lock per file for a delete, and for a rename needs to get a 
>>>> lock of ~.Trash too. I don't see it being any worse here.
>>>>
>>>> Now, if you were to add bulk delete support to hdfs, we could send a 
>>>> single RPC there with a batch of files and hdfs could go through them and 
>>>> delete in turn, failing if a dir was encountered.And like the s3a 
>>>> implementation, it could be throttled: you'd implement that on the server 
>>>> before actually acquiring any locks so all callers of bulk delete would be 
>>>> constrained
>>>>
>>>>
>>>>>
>>>>> If possible, I would still like Iceberg to have this feature.
>>>>>
>>>>> Thanks,
>>>>> Cheng Pan
>>>>>
>>>>> On Fri, Feb 20, 2026 at 3:22 AM Daniel Weeks <[email protected]> wrote:
>>>>> >
>>>>> > I agree with Steve and Ryan on this.
>>>>> >
>>>>> > I was a bit critical of all the issues with configuration and behavior 
>>>>> > when reviewing the PR, but felt that containing it to HDFS might make 
>>>>> > it reasonable to close the gap in behavior between Hive tables and 
>>>>> > Iceberg.
>>>>> >
>>>>> > However, it is complicated, messy and could cause surprising behavior 
>>>>> > for anyone who has it turned on in their environment when it suddenly 
>>>>> > starts being respected causing lots of trash behavior.
>>>>> >
>>>>> > I'll open a PR to revert and reach out to the original author.
>>>>> >
>>>>> > -Dan
>>>>> >
>>>>> > On Thu, Feb 19, 2026 at 11:14 AM Steve Loughran <[email protected]> 
>>>>> > wrote:
>>>>> >>
>>>>> >>
>>>>> >> I'm very happy with removing support; it just complicates the code for 
>>>>> >> a failure condition "accidental deletion" which shouldn't surface.
>>>>> >>
>>>>> >> The only times where the users may want to roll back a delete is DROP 
>>>>> >> TABLE, and there it's the homework of the catalog to give users a way 
>>>>> >> to revert it.
>>>>> >>
>>>>> >> It's not shipped yet so removal is not a regression at all.
>>>>> >>
>>>>> >> steve
>>>>> >>
>>>>> >>
>>>>> >> On Wed, 18 Feb 2026 at 22:48, Ryan Blue <[email protected]> wrote:
>>>>> >>>
>>>>> >>> During the Iceberg sync this morning, Steve suggested a PR to fix a 
>>>>> >>> problem with HadoopFileIO, #15111. I looked into this a bit more and 
>>>>> >>> it is based on #14501, which implements a Hadoop scheme where delete 
>>>>> >>> may actually move a file to a configured trash directory rather than 
>>>>> >>> deleting it. I think that this trash behavior is strange and doesn't 
>>>>> >>> fit into FileIO. I think the right thing to do is to probably remove 
>>>>> >>> it but I want to see what arguments for the behavior there are.
>>>>> >>>
>>>>> >>> In my opinion, the trash behavior is confusing and not obvious for 
>>>>> >>> the FileIO interface. The behavior, as I understand it, is to check 
>>>>> >>> whether a file should actually be deleted or should just be moved to 
>>>>> >>> a trash folder. Interestingly, this is not done underneath the Hadoop 
>>>>> >>> FileSystem interface, but is a client responsibility. Since FileIO is 
>>>>> >>> similar to FileSystem, I think there's a strong argument that it 
>>>>> >>> isn't appropriate within FileIO either. But there's another argument 
>>>>> >>> for not having this behavior, which is that table changes and 
>>>>> >>> user-driven file changes are not the same. Table can churn files 
>>>>> >>> quite a bit and deletes shouldn't move uncommitted files to trash -- 
>>>>> >>> they don't need to be recovered -- nor should they move replaced or 
>>>>> >>> deleted data files to a trash folder that could be in a user's home 
>>>>> >>> directory -- this is a big and not obvious behavior change. This 
>>>>> >>> seems to be in conflict with reasonable governance schemes because it 
>>>>> >>> could leak sensitive data.
>>>>> >>>
>>>>> >>> Next, the use case for a trash folder is to recover from accidental 
>>>>> >>> deletes by users. This is unnecessary in Iceberg because tables keep 
>>>>> >>> their own history. Accidental data operations are easily rolled back 
>>>>> >>> and we have a configurable history in which you can do it. This is 
>>>>> >>> also already integrated cleanly so that temporary metadata files that 
>>>>> >>> end up not being committed are not held.
>>>>> >>>
>>>>> >>> In the end, I think that we don't need this because history is 
>>>>> >>> already kept in a better way for tables, and this feature is 
>>>>> >>> confusing and doesn't fit in the API. What are the use cases for 
>>>>> >>> keeping this?
>>>>> >>>
>>>>> >>> Ryan

Re: [DISCUSS] Remove HDFS trash behavior?

Reply via email to