[ https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756557#comment-17756557 ]
ASF GitHub Bot commented on MAPREDUCE-7445: ------------------------------------------- moonlightingLL closed pull request #5968: MAPREDUCE-7445. ShuffleSchedulerImpl causes ArithmeticException due to improper detailsInterval value checking URL: https://github.com/apache/hadoop/pull/5968 > ShuffleSchedulerImpl causes ArithmeticException due to improper > detailsInterval value checking > ---------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-7445 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 3.3.3 > Reporter: ConfX > Priority: Critical > Labels: pull-request-available > Attachments: reproduce.sh > > > h2. What happened > There is no value checking for parameter > {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper > calculations and crashes the system like division by 0. > h2. Buggy code > In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for > {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to > method {{{}checkAndInformMRAppMaster{}}}. When > {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would > cause division by 0 and throw ArithmeticException to crash the system. > > {noformat} > private void checkAndInformMRAppMaster( > ... > if (connectExcpt || (reportReadErrorImmediately && readError) > || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed) > { > ... > }{noformat} > h2. How to reproduce > (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}}, > {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}} > (2) run {{mvn surefire:test > -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}} > h2. Stacktrace > {noformat} > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org