[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

Steve Loughran (JIRA) Wed, 28 Mar 2018 04:49:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417224#comment-16417224
 ]


Steve Loughran commented on HADOOP-15349:
-----------------------------------------

Attached full log

# no meaningful details in the exception, should be "DDB calls not completing", 
maybe some history
# we should compare before & after of results processed. If the result count is 
decreasing, then its OK to keep retrying, as things have slowed down, not 
failed.
# +review timeout defaults & include details in exception "after 20s"

This happened during job commit, which is pretty sensitive. The job *did* 
complete successfully, because it's wrapped in retry code too. But I think we 
could have handled it better at the lower levels, as not all apps will be 
retrying so much.

+[~fabbri] [~gabor.bota]

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> ------------------------------------------------------------------
>
>                 Key: HADOOP-15349
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15349
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>         Attachments: failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

Reply via email to