kbendick commented on PR #5478: URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209758658
For the log that is causing confusion for users because the distributed engines (in this case Spark) then go on to do some cleanup, maybe we can update the log to state something similar to we're "skipping clean up during current iteration, cleanup might occur later if this action is run from a distributed processing engine"? Then possibly add logs to Spark as well? > I would like a lot to distinguish between "didn't try" and "nothing to clean up" -- even when "nothing to clean up" is because there were no expired files. Maybe we need a flag for "delayedFileCleanup" or just generally "runningViaDistributedAction" and then let Spark / engines control the logs? I don't love it, but throwing it out there as the current logging is admittedly confusing when the Spark action runs (which is the primary way users interact with this action). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
