[ https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756556#comment-17756556 ]
ASF GitHub Bot commented on MAPREDUCE-7445: ------------------------------------------- moonlightingLL opened a new pull request, #5968: URL: https://github.com/apache/hadoop/pull/5968 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR Add 0 check for `maxFetchFailuresBeforeReporting `, since there is no value checking for parameter `mapreduce.reduce.shuffle.maxfetchfailures`. This may cause improper calculations and crashes the system like division by 0. ### How was this patch tested? Set `mapreduce.reduce.shuffle.maxfetchfailures`=`0`, `mapreduce.reduce.shuffle.notify.readerror`=`false`. Then run `mvn surefire:test -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap` ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > ShuffleSchedulerImpl causes ArithmeticException due to improper > detailsInterval value checking > ---------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-7445 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 3.3.3 > Reporter: ConfX > Priority: Critical > Attachments: reproduce.sh > > > h2. What happened > There is no value checking for parameter > {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper > calculations and crashes the system like division by 0. > h2. Buggy code > In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for > {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to > method {{{}checkAndInformMRAppMaster{}}}. When > {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would > cause division by 0 and throw ArithmeticException to crash the system. > > {noformat} > private void checkAndInformMRAppMaster( > ... > if (connectExcpt || (reportReadErrorImmediately && readError) > || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed) > { > ... > }{noformat} > h2. How to reproduce > (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}}, > {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}} > (2) run {{mvn surefire:test > -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}} > h2. Stacktrace > {noformat} > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org