jordepic commented on PR #14501:
URL: https://github.com/apache/iceberg/pull/14501#issuecomment-3549637178

   > The issue is that the config can be different for the client than for the 
NameNode. So if a client configures interval > 0, but the NameNode does not 
have that config, then a client will move data files, but they will never be 
cleaned up.
   
   Good point.  Though, at the end of the day, I'm not sure that I see this 
differently from any other misconfiguration that an iceberg user might have 
that would adversely impact them.  For example, we misconfigured a table 
location and then removed an entire hadoop directory thinking they were orphan 
files, haha!
   
   > HadoopFileIO is an abstraction for all Hadoop FileSystem implementations 
(DistributedFileSystem, S3AFileSystem, GCSFileSystem, etc.). That means that if 
I enable this in core-side.xml and use a s3 mapped scheme, I would trigger the 
move behavior, which I don't think we want for non HDFS file systems. The 
config (fs.trash.interval) is not specific to a scheme, so it appears to be 
global for all file system implementations.
   
   Also a fair point.  I think that I could resolve this one pretty safely 
using some instanceOf checks on the FileSystem object.  Are you at all opposed 
to that?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to