[
https://issues.apache.org/jira/browse/HADOOP-15604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564444#comment-16564444
]
Steve Loughran commented on HADOOP-15604:
-----------------------------------------
reviewing the commit code
* If the problem is DDB, it'll be in {{finishedWrite}}
* there are retries going on, but we're not doing much with the retry callback.
Maybe
Looking at {{S3AFS.finishedWrite()}}, it's doing
{{deleteUnnecessaryFakeDirectories()}} every time. I don't see that being
needed when doing a full set of commits: if you are committing 50 files to a
dir, then you don't need to issue 50+ batch delete calls, just one at the start
of the commit (maybe a WriteOperationsHelper.prepareForCommit(path) call).
Similarly, it's calling S3Guard add ancestors every time. If we can track which
ancestors have already been updated as part of a commit, then many fewer DDB
queries will be needed. Looking at {{addAncestors()}}, it's at least one DDB
GET per file, even if they all share the same parent dir.
I think we can do stuff here to reduce load, and it'll matter for the MPU
upload of HADOOP-15576.
Moving under HADOOP-15620 as a Hadoop 3.3 S3A top level feature, renaming
"bulk commits of S3A MPUs place needless excessive load on S3 & S3Guard"
> Test if the unprocessed items in S3Guard DDB metadata store caused by I/O
> thresholds
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-15604
> URL: https://issues.apache.org/jira/browse/HADOOP-15604
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Gabor Bota
> Assignee: Gabor Bota
> Priority: Major
>
> When there are ~50 files being committed; each in their own thread from the
> commit pool; probably the DDB repo is being overloaded just from one single
> process doing task commit. We should be backing off more, especially given
> that failing on a write could potentially leave the store inconsistent with
> the FS (renames, etc)
> It would be nice to have some tests to prove that the I/O thresholds are the
> reason for unprocessed items in DynamoDB metadata store
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]