[ https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654616#comment-16654616 ]
TezQA commented on TEZ-3976: ---------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 33s{color} | {color:red} root in master failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-api in master failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in master failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 10s{color} | {color:red} tez-api in master failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in master failed with JDK v1.8.0_172. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} tez-api in master failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in master failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} tez-api in master failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in master failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s{color} | {color:red} tez-api in master failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s{color} | {color:red} tez-runtime-library in master failed with JDK v1.8.0_172. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} tez-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 8s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 8s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} tez-api: The patch generated 14 new + 12 unchanged - 2 fixed = 26 total (was 14) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} tez-runtime-library: The patch generated 18 new + 588 unchanged - 3 fixed = 606 total (was 591) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} tez-runtime-library in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_181. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 10s{color} | {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 8m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | TEZ-3976 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944472/TEZ-3976.5.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux asf912.gq1.ygridcore.net 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/yetus/precommit/personality/tez.sh | | git revision | master / 9277815 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_172 | | Multi-JDK versions | /usr/lib/jvm/java-8-openjdk-amd64:1.8.0_181 /usr/local/asfpackages/java/jdk1.8.0_172:1.8.0_172 | | mvninstall | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-mvninstall-root.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-api-jdk1.8.0_181.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-runtime-library-jdk1.8.0_181.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-api-jdk1.8.0_172.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-runtime-library-jdk1.8.0_172.txt | | findbugs | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-findbugs-tez-api.txt | | findbugs | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-findbugs-tez-runtime-library.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-api-jdk1.8.0_181.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-runtime-library-jdk1.8.0_181.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-api-jdk1.8.0_172.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-runtime-library-jdk1.8.0_172.txt | | mvninstall | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-mvninstall-tez-api.txt | | mvninstall | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-mvninstall-tez-runtime-library.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_181.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_181.txt | | javac | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_181.txt | | javac | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_181.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_172.txt | | compile | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_172.txt | | javac | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_172.txt | | javac | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_172.txt | | checkstyle | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/diff-checkstyle-tez-api.txt | | checkstyle | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/diff-checkstyle-tez-runtime-library.txt | | findbugs | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-findbugs-tez-api.txt | | findbugs | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-findbugs-tez-runtime-library.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-api-jdk1.8.0_181.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-runtime-library-jdk1.8.0_181.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-api-jdk1.8.0_172.txt | | javadoc | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-runtime-library-jdk1.8.0_172.txt | | unit | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-unit-tez-api-jdk1.8.0_172.txt | | unit | https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-unit-tez-runtime-library-jdk1.8.0_172.txt | | JDK v1.8.0_172 Test Results | https://builds.apache.org/job/PreCommit-TEZ-Build/32/testReport/ | | modules | C: tez-api tez-runtime-library U: . | | Console output | https://builds.apache.org/job/PreCommit-TEZ-Build/32/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > ShuffleManager reporting too many errors > ---------------------------------------- > > Key: TEZ-3976 > URL: https://issues.apache.org/jira/browse/TEZ-3976 > Project: Apache Tez > Issue Type: Bug > Reporter: Jaume M > Assignee: Jaume M > Priority: Major > Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, > TEZ-3976.4.patch, TEZ-3976.5.patch > > > The symptoms are a lot of these logs are being shown: > {code:java} > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #0 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_1529044441963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, > attemptNumber=0, > pathComponent=attempt_1529044441963_0021_34_01_000701_0_12541_0, spillType=2, > spillId=0], connectFailed: true > 2018-06-15T18:09:35,811 WARN [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for > tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, > pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]] > 2018-06-15T18:09:35,811 INFO [Fetcher_B {Reducer_5} #1 ()] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: > Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, > attemptNumber=0, > pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, > spillId=0], connectFailed: true > {code} > Each of those translate into an event in the AM which finally crashes due to > OOM after around 30 minutes and around 10 million shuffle input errors (and > 10 million lines like the previous ones). When the ShufflerManager is closed > and the counters reported there are many shuffle input errors, some of those > logs are: > {code:java} > 2018-06-15T17:46:30,988 INFO [TezTR-441963_21_34_4_0_4 > (1529044441963_0021_34_04_000000_4)] runtime.LogicalIOProcessorRuntimeTask: > Final Counters for attempt_1529044441963_0021_34_04_000000_4: Counters: 43 > [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, > NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, > INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, > OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, > OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, > SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, > SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, > SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, > FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE > RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, > RECORDS_OUT_OPERATOR_GBY_159=1, > RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11 > FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, > LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, > NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, > SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, > SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, > SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1 > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, > OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, > SPILLED_RECORDS=0]] > 2018-06-15T17:46:32,271 INFO [TezTR-441963_21_34_3_15_1 ()] > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for > attempt_1529044441963_0021_34_03_000015_1: Counters: 87 [[File System > Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, > FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, > HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, > HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter > SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, > INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, > OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, > OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTPUT_BYTES_PHYSICAL=357520, > ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, > ADDITIONAL_SPILL_COUNT=0, SHUFFLE_BYTES=31, SHUFFLE_BYTES_DECOMPRESSED=17, > SHUFFLE_BYTES_TO_MEM=31, SHUFFLE_BYTES_TO_DISK=0, > SHUFFLE_BYTES_DISK_DIRECT=0, SHUFFLE_PHASE_TIME=50525, > FIRST_EVENT_RECEIVED=9, LAST_EVENT_RECEIVED=61][HIVE DESERIALIZE_ERRORS=0, > RECORDS_IN_Map_11=395611, RECORDS_OUT_INTERMEDIATE_Map_11=15737, > RECORDS_OUT_OPERATOR_FIL_152=395611, RECORDS_OUT_OPERATOR_GBY_157=1, > RECORDS_OUT_OPERATOR_MAPJOIN_154=15736, RECORDS_OUT_OPERATOR_MAP_0=0, > RECORDS_OUT_OPERATOR_RS_155=15736, RECORDS_OUT_OPERATOR_RS_158=1, > RECORDS_OUT_OPERATOR_SEL_153=395611, RECORDS_OUT_OPERATOR_SEL_156=15736, > RECORDS_OUT_OPERATOR_TS_26=395611][TaskCounter_Map_11_INPUT_Map_13 > FIRST_EVENT_RECEIVED=9, INPUT_RECORDS_PROCESSED=1, LAST_EVENT_RECEIVED=61, > NUM_FAILED_SHUFFLE_INPUTS=105195, NUM_SHUFFLED_INPUTS=1, SHUFFLE_BYTES=31, > SHUFFLE_BYTES_DECOMPRESSED=17, SHUFFLE_BYTES_DISK_DIRECT=0, > SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=31, > SHUFFLE_PHASE_TIME=50525][TaskCounter_Map_11_INPUT_supplier > INPUT_RECORDS_PROCESSED=396, > INPUT_SPLIT_LENGTH_BYTES=21563271][TaskCounter_Map_11_OUTPUT_Reducer_12 > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779474, OUTPUT_BYTES_PHYSICAL=164787, > OUTPUT_BYTES_WITH_OVERHEAD=779485, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, > SPILLED_RECORDS=0][TaskCounter_Map_11_OUTPUT_Reducer_6 > ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, > ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=456344, OUTPUT_BYTES_PHYSICAL=192733, > OUTPUT_BYTES_WITH_OVERHEAD=487822, OUTPUT_LARGE_RECORDS=0, > OUTPUT_RECORDS=15736, > SPILLED_RECORDS=0][org.apache.hadoop.hive.llap.counters.LlapIOCounters > ALLOCATED_BYTES=9633792, ALLOCATED_USED_BYTES=7976706, CACHE_HIT_BYTES=0, > CACHE_MISS_BYTES=2344364, CONSUMER_TIME_NS=1136392475, > DECODE_TIME_NS=140377915, HDFS_TIME_NS=145825282, METADATA_CACHE_MISS=4, > NUM_DECODED_BATCHES=41, NUM_VECTOR_BATCHES=396, ROWS_EMITTED=395611, > SELECTED_ROWGROUPS=41, TOTAL_IO_TIME_NS=1234990631]] > {code} > I think this is happening because the fetcher is in [this > loop|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/impl/ShuffleManager.java#L366] > which is started > [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L69] > and not stopped until the [processor is > closed|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.java#L387] > (which is called from > [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L83] > or > [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L110]). > In {{ShuffleManager}} we seem to keep track of > {{TaskCounter.NUM_FAILED_SHUFFLE_INPUTS}} but nothing is done when the value > gets too high. Maybe something similar to {{ShuffleScheduler}} should be done > where the fetchers are only retried a certain amount of times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)