[ 
https://issues.apache.org/jira/browse/TEZ-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-4194:
----------------------------------
    Attachment: 
NPE_TASK_syslog_attempt_1592898862823_0002_1_01_000120_0_apache.log

> NPE in FetcherOrderedGrouped 
> -----------------------------
>
>                 Key: TEZ-4194
>                 URL: https://issues.apache.org/jira/browse/TEZ-4194
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: 
> NPE_TASK_syslog_attempt_1592898862823_0002_1_01_000120_0_apache.log
>
>
> When running store_sales data generation (@ 1 TB scale) on a cloud 
> environment, this was observed.
> This causes task to fail and re-execute. One in 2 or 4 runs gets into this. 
> Weirdly, not sure why its trying to call fetcher when all inputs are 
> downloaded as per log.
>  
> {noformat}
> 2020-06-23 08:49:34,546 [INFO] [Fetcher_O {Map_1} #1] 
> |orderedgrouped.ShuffleScheduler|: All inputs fetched for input vertex : Map 1
> 2020-06-23 08:49:34,546 [INFO] [Fetcher_O {Map_1} #1] 
> |orderedgrouped.ShuffleScheduler|: copy(2385 (spillsFetched=2385) of 2385. 
> Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s)
> 2020-06-23 08:49:34,546 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.ShuffleScheduler|: Shutting down FetchScheduler for input: 
> Map_1, wasInterrupted=false
> 2020-06-23 08:49:34,547 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.ShuffleScheduler|: copy(2385 (spillsFetched=2385) of 2385. 
> Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s)
> 2020-06-23 08:49:34,548 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.ShuffleScheduler|: Shutting down fetchers for input: Map_1, 
> shutdown timetaken: 0 ms, hasFetcherExecutorStopped: true
> 2020-06-23 08:49:34,549 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.MergeManager|: TotalInMemFetchStats: count=2115, 
> totalSize=28250798, min=2011, max=17268, avg=1.0
> 2020-06-23 08:49:34,549 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.MergeManager|: finalMerge with #inMemoryOutputs=2115, 
> size=28250798 and #onDiskOutputs=144, size=3487553
> 2020-06-23 08:49:34,686 [INFO] [Fetcher_O {Map_1} #3] 
> |orderedgrouped.Shuffle|: Map_1: Setting throwable in reportException with 
> message [null] from thread [Fetcher_O {Map_1} #3
> 2020-06-23 08:49:34,687 [INFO] [Fetcher_O {Map_1} #3] 
> |orderedgrouped.ShuffleScheduler|: Map_1: Already shutdown. Ignoring fetch 
> complete
> 2020-06-23 08:49:35,044 [ERROR] [ShuffleAndMergeRunner {Map_1}] 
> |orderedgrouped.Shuffle|: Map_1: ShuffleRunner failed with error
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  error in shuffle in Fetcher_O {Map_1} #3
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:332)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>         at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:354)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:263)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57)
>         ... 7 more
> 2020-06-23 08:49:35,046 [INFO] [ShuffleAndMergeRunner {Map_1}] 
> |task.TezTaskRunner2|: Received notification of a  failure  which will cause 
> the task to die
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  error in shuffle in Fetcher_O {Map_1} #3
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:332)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>         at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:354)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:263)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57)
>         ... 7 more
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to