[ 
https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654616#comment-16654616
 ] 

TezQA commented on TEZ-3976:
----------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  1m 
33s{color} | {color:red} root in master failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in master failed with JDK v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in master failed with JDK 
v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
10s{color} | {color:red} tez-api in master failed with JDK v1.8.0_172. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in master failed with JDK 
v1.8.0_172. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in master failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in master failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in master failed with JDK v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in master failed with JDK 
v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
10s{color} | {color:red} tez-api in master failed with JDK v1.8.0_172. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
10s{color} | {color:red} tez-runtime-library in master failed with JDK 
v1.8.0_172. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
11s{color} | {color:red} tez-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_181. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
8s{color} | {color:red} tez-runtime-library in the patch failed with JDK 
v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  9s{color} 
| {color:red} tez-api in the patch failed with JDK v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  8s{color} 
| {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_181. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in the patch failed with JDK 
v1.8.0_172. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  9s{color} 
| {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  9s{color} 
| {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. 
{color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} tez-api: The patch generated 14 new + 12 
unchanged - 2 fixed = 26 total (was 14) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} tez-runtime-library: The patch generated 18 new 
+ 588 unchanged - 3 fixed = 606 total (was 591) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
9s{color} | {color:red} tez-runtime-library in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_181. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
10s{color} | {color:red} tez-runtime-library in the patch failed with JDK 
v1.8.0_181. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
9s{color} | {color:red} tez-api in the patch failed with JDK v1.8.0_172. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
10s{color} | {color:red} tez-runtime-library in the patch failed with JDK 
v1.8.0_172. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  9s{color} 
| {color:red} tez-api in the patch failed with JDK v1.8.0_172. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 10s{color} 
| {color:red} tez-runtime-library in the patch failed with JDK v1.8.0_172. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  8m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | TEZ-3976 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944472/TEZ-3976.5.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux asf912.gq1.ygridcore.net 4.4.0-133-generic #159-Ubuntu SMP Fri 
Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/yetus/precommit/personality/tez.sh
 |
| git revision | master / 9277815 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_172 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-openjdk-amd64:1.8.0_181 
/usr/local/asfpackages/java/jdk1.8.0_172:1.8.0_172 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-mvninstall-root.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-api-jdk1.8.0_181.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-runtime-library-jdk1.8.0_181.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-api-jdk1.8.0_172.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-compile-tez-runtime-library-jdk1.8.0_172.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-findbugs-tez-api.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-findbugs-tez-runtime-library.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-api-jdk1.8.0_181.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-runtime-library-jdk1.8.0_181.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-api-jdk1.8.0_172.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/branch-javadoc-tez-runtime-library-jdk1.8.0_172.txt
 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-mvninstall-tez-api.txt
 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-mvninstall-tez-runtime-library.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_181.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_181.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_181.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_181.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_172.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_172.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-api-jdk1.8.0_172.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-compile-tez-runtime-library-jdk1.8.0_172.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/diff-checkstyle-tez-api.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/diff-checkstyle-tez-runtime-library.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-findbugs-tez-api.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-findbugs-tez-runtime-library.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-api-jdk1.8.0_181.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-runtime-library-jdk1.8.0_181.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-api-jdk1.8.0_172.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-javadoc-tez-runtime-library-jdk1.8.0_172.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-unit-tez-api-jdk1.8.0_172.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/artifact/out/patch-unit-tez-runtime-library-jdk1.8.0_172.txt
 |
| JDK v1.8.0_172  Test Results | 
https://builds.apache.org/job/PreCommit-TEZ-Build/32/testReport/ |
| modules | C: tez-api tez-runtime-library U: . |
| Console output | https://builds.apache.org/job/PreCommit-TEZ-Build/32/console 
|
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> ShuffleManager reporting too many errors
> ----------------------------------------
>
>                 Key: TEZ-3976
>                 URL: https://issues.apache.org/jira/browse/TEZ-3976
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jaume M
>            Assignee: Jaume M
>            Priority: Major
>         Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, 
> TEZ-3976.4.patch, TEZ-3976.5.patch
>
>
> The symptoms are a lot of these logs are being shown:
> {code:java}
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #0 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_1529044441963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_1529044441963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0], connectFailed: true
> 2018-06-15T18:09:35,811 WARN  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for 
> tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, 
> pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]]
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_1529044441963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0], connectFailed: true
> {code}
> Each of those translate into an event in the AM which finally crashes due to 
> OOM after around 30 minutes and around 10 million shuffle input errors (and 
> 10 million lines like the previous ones). When the ShufflerManager is closed 
> and the counters reported there are many shuffle input errors, some of those 
> logs are:
> {code:java}
> 2018-06-15T17:46:30,988  INFO [TezTR-441963_21_34_4_0_4 
> (1529044441963_0021_34_04_000000_4)] runtime.LogicalIOProcessorRuntimeTask: 
> Final Counters for attempt_1529044441963_0021_34_04_000000_4: Counters: 43 
> [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, 
> NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, 
> OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, 
> OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, 
> SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, 
> SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, 
> SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, 
> FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE 
> RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, 
> RECORDS_OUT_OPERATOR_GBY_159=1, 
> RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11
>  FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, 
> LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, 
> SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, 
> SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, 
> SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1
>  ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, 
> OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, 
> SPILLED_RECORDS=0]]
> 2018-06-15T17:46:32,271 INFO  [TezTR-441963_21_34_3_15_1 ()] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for 
> attempt_1529044441963_0021_34_03_000015_1: Counters: 87 [[File System 
> Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, 
> FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, 
> HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, 
> HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter 
> SPILLED_RECORDS=0, NUM_SHUFFLED_INPUTS=1, NUM_FAILED_SHUFFLE_INPUTS=105195, 
> INPUT_RECORDS_PROCESSED=397, INPUT_SPLIT_LENGTH_BYTES=21563271, 
> OUTPUT_RECORDS=15737, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=1235818, 
> OUTPUT_BYTES_WITH_OVERHEAD=1267307, OUTPUT_BYTES_PHYSICAL=357520, 
> ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, 
> ADDITIONAL_SPILL_COUNT=0, SHUFFLE_BYTES=31, SHUFFLE_BYTES_DECOMPRESSED=17, 
> SHUFFLE_BYTES_TO_MEM=31, SHUFFLE_BYTES_TO_DISK=0, 
> SHUFFLE_BYTES_DISK_DIRECT=0, SHUFFLE_PHASE_TIME=50525, 
> FIRST_EVENT_RECEIVED=9, LAST_EVENT_RECEIVED=61][HIVE DESERIALIZE_ERRORS=0, 
> RECORDS_IN_Map_11=395611, RECORDS_OUT_INTERMEDIATE_Map_11=15737, 
> RECORDS_OUT_OPERATOR_FIL_152=395611, RECORDS_OUT_OPERATOR_GBY_157=1, 
> RECORDS_OUT_OPERATOR_MAPJOIN_154=15736, RECORDS_OUT_OPERATOR_MAP_0=0, 
> RECORDS_OUT_OPERATOR_RS_155=15736, RECORDS_OUT_OPERATOR_RS_158=1, 
> RECORDS_OUT_OPERATOR_SEL_153=395611, RECORDS_OUT_OPERATOR_SEL_156=15736, 
> RECORDS_OUT_OPERATOR_TS_26=395611][TaskCounter_Map_11_INPUT_Map_13 
> FIRST_EVENT_RECEIVED=9, INPUT_RECORDS_PROCESSED=1, LAST_EVENT_RECEIVED=61, 
> NUM_FAILED_SHUFFLE_INPUTS=105195, NUM_SHUFFLED_INPUTS=1, SHUFFLE_BYTES=31, 
> SHUFFLE_BYTES_DECOMPRESSED=17, SHUFFLE_BYTES_DISK_DIRECT=0, 
> SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=31, 
> SHUFFLE_PHASE_TIME=50525][TaskCounter_Map_11_INPUT_supplier 
> INPUT_RECORDS_PROCESSED=396, 
> INPUT_SPLIT_LENGTH_BYTES=21563271][TaskCounter_Map_11_OUTPUT_Reducer_12 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779474, OUTPUT_BYTES_PHYSICAL=164787, 
> OUTPUT_BYTES_WITH_OVERHEAD=779485, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, 
> SPILLED_RECORDS=0][TaskCounter_Map_11_OUTPUT_Reducer_6 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=456344, OUTPUT_BYTES_PHYSICAL=192733, 
> OUTPUT_BYTES_WITH_OVERHEAD=487822, OUTPUT_LARGE_RECORDS=0, 
> OUTPUT_RECORDS=15736, 
> SPILLED_RECORDS=0][org.apache.hadoop.hive.llap.counters.LlapIOCounters 
> ALLOCATED_BYTES=9633792, ALLOCATED_USED_BYTES=7976706, CACHE_HIT_BYTES=0, 
> CACHE_MISS_BYTES=2344364, CONSUMER_TIME_NS=1136392475, 
> DECODE_TIME_NS=140377915, HDFS_TIME_NS=145825282, METADATA_CACHE_MISS=4, 
> NUM_DECODED_BATCHES=41, NUM_VECTOR_BATCHES=396, ROWS_EMITTED=395611, 
> SELECTED_ROWGROUPS=41, TOTAL_IO_TIME_NS=1234990631]]
> {code}
> I think this is happening because the fetcher is in [this 
> loop|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/impl/ShuffleManager.java#L366]
>  which is started 
> [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L69]
>  and not stopped until the [processor is 
> closed|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.java#L387]
>  (which is called from 
> [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L83]
>  or 
> [here|https://github.com/apache/tez/blob/fe22f3276d6d97f6b5dfab24490ee2ca32bf73c3/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TaskRunner2Callable.java#L110]).
> In {{ShuffleManager}} we seem to keep track of 
> {{TaskCounter.NUM_FAILED_SHUFFLE_INPUTS}} but nothing is done when the value 
> gets too high. Maybe something similar to {{ShuffleScheduler}} should be done 
> where the fetchers are only retried a certain amount of times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to