[
https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029922#comment-18029922
]
ASF GitHub Bot commented on MAPREDUCE-7445:
-------------------------------------------
github-actions[bot] closed pull request #6051: MAPREDUCE-7445.
ShuffleSchedulerImpl causes ArithmeticException due to improper detailsInterval
value checking
URL: https://github.com/apache/hadoop/pull/6051
> ShuffleSchedulerImpl causes ArithmeticException due to improper
> detailsInterval value checking
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7445
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 3.3.3
> Reporter: ConfX
> Priority: Critical
> Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened
> There is no value checking for parameter
> {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper
> calculations and crashes the system like division by 0.
> h2. Buggy code
> In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for
> {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to
> method {{{}checkAndInformMRAppMaster{}}}. When
> {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would
> cause division by 0 and throw ArithmeticException to crash the system.
>
> {noformat}
> private void checkAndInformMRAppMaster(
> ...
> if (connectExcpt || (reportReadErrorImmediately && readError)
> || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed)
> {
> ...
> }{noformat}
> h2. How to reproduce
> (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}},
> {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
> (2) run {{mvn surefire:test
> -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
> h2. Stacktrace
> {noformat}
> java.lang.ArithmeticException: / by zero
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
> at
> org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]