jordepic commented on PR #14501: URL: https://github.com/apache/iceberg/pull/14501#issuecomment-3549637178
> The issue is that the config can be different for the client than for the NameNode. So if a client configures interval > 0, but the NameNode does not have that config, then a client will move data files, but they will never be cleaned up. Good point. Though, at the end of the day, I'm not sure that I see this differently from any other misconfiguration that an iceberg user might have that would adversely impact them. For example, we misconfigured a table location and then removed an entire hadoop directory thinking they were orphan files, haha! > HadoopFileIO is an abstraction for all Hadoop FileSystem implementations (DistributedFileSystem, S3AFileSystem, GCSFileSystem, etc.). That means that if I enable this in core-side.xml and use a s3 mapped scheme, I would trigger the move behavior, which I don't think we want for non HDFS file systems. The config (fs.trash.interval) is not specific to a scheme, so it appears to be global for all file system implementations. Also a fair point. I think that I could resolve this one pretty safely using some instanceOf checks on the FileSystem object. Are you at all opposed to that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
