[jira] [Commented] (MAPREDUCE-7474) [ABFS] Improve commit resilience and performance in Manifest Committer

ASF GitHub Bot (Jira) Tue, 09 Apr 2024 11:29:17 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835505#comment-17835505
 ]


ASF GitHub Bot commented on MAPREDUCE-7474:
-------------------------------------------

steveloughran opened a new pull request, #6716:
URL: https://github.com/apache/hadoop/pull/6716

   
   Improve resilience of task commit save and rename operation with retries.
     
   * Retries of save()
     5 attempts, with 500 millis sleep between them. No configuration.
     Issue: should we make this configurable?
   * Split delete(path, recursive) into deleteFile and rmdir for separate
     statistics.
     
   Test simulation expands to:
   * Support recovery through a countdown of calls to fail.
   * Simulate timeout before *and after* rename calls.
   
   This is based on #6596 but skips the rate limiting logic spanning common and 
azure,
   instead it only contains changes in manifest committer -easier to backport.
   
   
   ### How was this patch tested?
   
   * manual test of new tests
   * full test suite left to yetus
   * azure test run in progress.
   
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> [ABFS] Improve commit resilience and performance in Manifest Committer
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7474
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7474
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.4.0, 3.3.6
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> * Manifest committer is not resilient to rename failures on task commit 
> without HADOOP-18012 rename recovery enabled. 
> * large burst of delete calls noted: are they needed
> relates to HADOOP-19093 but takes a more minimal approach with goal of 
> changes in manifest committer only.
> Initial proposed changes
> * retry recovery on task commit rename, always (repeat save, delete, rename)
> * audit delete use and see if it can be pruned
> * maybe: rate limit some IO internally, but not delegate to abfs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7474) [ABFS] Improve commit resilience and performance in Manifest Committer

Reply via email to