[
https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756556#comment-17756556
]
ASF GitHub Bot commented on MAPREDUCE-7445:
-------------------------------------------
moonlightingLL opened a new pull request, #5968:
URL: https://github.com/apache/hadoop/pull/5968
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
Add 0 check for `maxFetchFailuresBeforeReporting `, since there is no value
checking for parameter `mapreduce.reduce.shuffle.maxfetchfailures`. This may
cause improper calculations and crashes the system like division by 0.
### How was this patch tested?
Set `mapreduce.reduce.shuffle.maxfetchfailures`=`0`,
`mapreduce.reduce.shuffle.notify.readerror`=`false`.
Then run `mvn surefire:test
-Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap`
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> ShuffleSchedulerImpl causes ArithmeticException due to improper
> detailsInterval value checking
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7445
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 3.3.3
> Reporter: ConfX
> Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> There is no value checking for parameter
> {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper
> calculations and crashes the system like division by 0.
> h2. Buggy code
> In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for
> {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to
> method {{{}checkAndInformMRAppMaster{}}}. When
> {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would
> cause division by 0 and throw ArithmeticException to crash the system.
>
> {noformat}
> private void checkAndInformMRAppMaster(
> ...
> if (connectExcpt || (reportReadErrorImmediately && readError)
> || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed)
> {
> ...
> }{noformat}
> h2. How to reproduce
> (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}},
> {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
> (2) run {{mvn surefire:test
> -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
> h2. Stacktrace
> {noformat}
> java.lang.ArithmeticException: / by zero
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
> at
> org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]