[GitHub] [hadoop] steveloughran commented on pull request #2971: MAPREDUCE-7341. Intermediate Manifest Committer

GitBox Fri, 11 Mar 2022 11:29:53 -0800


steveloughran commented on pull request #2971:
URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1065424989



   # new API in hadoop-common
   
   The `hadoop-mapreduce-client-core` JAR is only used in 
`AbfsManifestStoreOperations` and
   
   I understand your concerns about making mapreduce JAR a dependency, but the 
jar is optional. you
   don't need it for FS operations. maven tags it as provided so it isn't 
published transitively, and
   the only imports are in a new committer factory and the store operations.
   
   If you want the new commmitter, yes, you need the JAR. But that's always 
been the case
   (and why spark, ParquetOutputFormat etc all import it).
   
   Putting it into hadoop common becomes a commitment to maintain it -it will 
be used by someone-
   and adds the need for better testing and strict spec.
   
   I'd rather put that off to doing a proper Rename operation which worked well 
with stores
   as well as HDFS, and always raised an exception on failure, and come in on 
an interface both
   FileSystem and FileContext implemented. Being builder based, etags would be 
an optional
   builder param for stores which cared.
   
   But we'd need to get rename() right there, and that's a nightmare.
   
   # cleanup
   
   `FileSystem.delete()` doesn't use trash, so the `moveToTrash` flag
   _is_ needed.
   
   |  **FS trash** | **moveToTrash** | **outcome** |
   |-------------|---------------|---------|
   |   `true`    |  `false`      | files deleted |
   |   `true`    |  `true`      | files to trash |
   |   `false`    |  `false`      | files deleted |
   |   `false`    |  `true`      | files deleted |
   
   Now, I do agree things are over complex. i was trying to be resilient to 
timeouts on dir deletes,
   but as I think that will be rare, how about I change the code
   
   * no attempt to downgrade from delete to move to trash if delete raises an 
exception
   * if moveToTrash is true, then fs trash *must* be enabled.
   
   This gives.
   
   |  **FS trash** | **moveToTrash** | **outcome** |
   |-------------|---------------|---------|
   |   `true`    |  `false`      | files deleted |
   |   `true`    |  `true`      | files to trash |
   |   `false`    |  `false`      | files deleted |
   |   `false`    |  `true`      | error |
   
   And all the recovery and tests are gone.
   
   What I could do here is reject that error on job setup, rather than waiting 
for cleanup.
   
   do you agree?
   
   # my next steps
   
   * will cut recovery from delete failures in cleanup.
   * log committer at info in factory
   * whatever else yetus complains of


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] steveloughran commented on pull request #2971: MAPREDUCE-7341. Intermediate Manifest Committer

Reply via email to