[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190804#comment-17190804
 ] 

László Bodor edited comment on HIVE-24111 at 9/4/20, 4:17 PM:
--------------------------------------------------------------

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_000000_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_000000_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_000000_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_000000_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_000000_0
2020-09-04T02:12:16,416  INFO [TezChild] tez.ReduceRecordProcessor: Waiting for 
ShuffleInputs to become ready
{code}

so in the hanging run we reach the point of ShuffleScheduler instantiation, but 
never reach this point, which is called from ShuffleScheduler.start()
{code}
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
{code}


was (Author: abstractdog):
For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_000000_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_000000_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_000000_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_000000_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_000000_0
2020-09-04T02:12:16,416  INFO [TezChild] tez.ReduceRecordProcessor: Waiting for 
ShuffleInputs to become ready
{code}

so in the hanging run we reach the point of ShuffleScheduler instantiation, but 
never reach this point:
{code}
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
{code}

> TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24111
>                 URL: https://issues.apache.org/jira/browse/HIVE-24111
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TestCrudCompactorTez.log, jstack.log, 
> org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log
>
>
> Reproduced issue in ptest run which I made to run against tez staging 
> artifacts 
> (https://repository.apache.org/content/repositories/orgapachetez-1068/)
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417
> I'm about to investigate this. I think Tez 0.10.0 cannot be released until we 
> won't confirm if it's a hive or tez bug.
> {code}
> mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit
> {code}
> tez setup:
> https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to