[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756557#comment-17756557
 ] 

ASF GitHub Bot commented on MAPREDUCE-7445:
-------------------------------------------

moonlightingLL closed pull request #5968: MAPREDUCE-7445. ShuffleSchedulerImpl 
causes ArithmeticException due to improper detailsInterval value checking
URL: https://github.com/apache/hadoop/pull/5968




> ShuffleSchedulerImpl causes ArithmeticException due to improper 
> detailsInterval value checking
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7445
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 3.3.3
>            Reporter: ConfX
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: reproduce.sh
>
>
> h2. What happened
> There is no value checking for parameter 
> {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper 
> calculations and crashes the system like division by 0.
> h2. Buggy code
> In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for 
> {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to 
> method {{{}checkAndInformMRAppMaster{}}}. When 
> {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would 
> cause division by 0 and throw ArithmeticException to crash the system.
>  
> {noformat}
> private void checkAndInformMRAppMaster(
>      ...
>     if (connectExcpt || (reportReadErrorImmediately && readError)
>         || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed) 
> {
>       ...
>   }{noformat}
> h2. How to reproduce
> (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}}, 
> {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
> (2) run {{mvn surefire:test 
> -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
> h2. Stacktrace
> {noformat}
> java.lang.ArithmeticException: / by zero
>     at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
>     at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
>     at 
> org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to