Re: [PR] Core: Move deleted files to Hadoop trash if configured [iceberg]

via GitHub Tue, 18 Nov 2025 10:18:23 -0800


danielcweeks commented on PR #14501:
URL: https://github.com/apache/iceberg/pull/14501#issuecomment-3548976534


   @jordepic and @ludlows After looking a little more into the way trash works, 
I don't think this is something we want to turn on at a table level (especially 
considering how this implementation works).
   
   The Trash feature in Hadoop/HDFS is quite strange as it's a client, config, 
and cluster level feature that all depend upon each other.  For example, the 
client has to respect the config and initialize the Trash and perform a move 
operation otherwise it's ignored.  The config has to be set and configured 
properly to a location the user has access to.  Finally, if you don't apply the 
configuration to both the client and the NameNode, then cleanup won't be 
performed properly.
   
   Given all of that, this feels very much like a administrator-level feature 
that needs to be configured (this appears to be the case for Cloudera already, 
though I don't know if engines like Hive/Impala respect the trash settings).
   
   It could be potentially dangerous to allow users to configure this on a 
per-table basis because cleanup may not be configured, which may result in data 
that should be deleted, persisting in the file system.  There's also nothing 
that appears to prevent the configuration from being applied to other 
file-system implementations (like S3A), which would be bad (data copy, no 
cleanup), but I feel like we should discourage that.  @jordepic Is there 
anything we can do to prevent this?
   
   I'm not a huge fan of this approach, but it seems like what we have to work 
with.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Move deleted files to Hadoop trash if configured [iceberg]

Reply via email to