steveloughran commented on pull request #2971: URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1065424989
# new API in hadoop-common The `hadoop-mapreduce-client-core` JAR is only used in `AbfsManifestStoreOperations` and I understand your concerns about making mapreduce JAR a dependency, but the jar is optional. you don't need it for FS operations. maven tags it as provided so it isn't published transitively, and the only imports are in a new committer factory and the store operations. If you want the new commmitter, yes, you need the JAR. But that's always been the case (and why spark, ParquetOutputFormat etc all import it). Putting it into hadoop common becomes a commitment to maintain it -it will be used by someone- and adds the need for better testing and strict spec. I'd rather put that off to doing a proper Rename operation which worked well with stores as well as HDFS, and always raised an exception on failure, and come in on an interface both FileSystem and FileContext implemented. Being builder based, etags would be an optional builder param for stores which cared. But we'd need to get rename() right there, and that's a nightmare. # cleanup `FileSystem.delete()` doesn't use trash, so the `moveToTrash` flag _is_ needed. | **FS trash** | **moveToTrash** | **outcome** | |-------------|---------------|---------| | `true` | `false` | files deleted | | `true` | `true` | files to trash | | `false` | `false` | files deleted | | `false` | `true` | files deleted | Now, I do agree things are over complex. i was trying to be resilient to timeouts on dir deletes, but as I think that will be rare, how about I change the code * no attempt to downgrade from delete to move to trash if delete raises an exception * if moveToTrash is true, then fs trash *must* be enabled. This gives. | **FS trash** | **moveToTrash** | **outcome** | |-------------|---------------|---------| | `true` | `false` | files deleted | | `true` | `true` | files to trash | | `false` | `false` | files deleted | | `false` | `true` | error | And all the recovery and tests are gone. What I could do here is reject that error on job setup, rather than waiting for cleanup. do you agree? # my next steps * will cut recovery from delete failures in cleanup. * log committer at info in factory * whatever else yetus complains of -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
