[
https://issues.apache.org/jira/browse/TAJO-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinho Kim updated TAJO-1560:
----------------------------
Fix Version/s: 0.10.1
0.11.0
> HashShuffle report should be ignored when a succeed tasks are not included
> --------------------------------------------------------------------------
>
> Key: TAJO-1560
> URL: https://issues.apache.org/jira/browse/TAJO-1560
> Project: Tajo
> Issue Type: Bug
> Components: data shuffle, query master
> Affects Versions: 0.10.0
> Reporter: Jinho Kim
> Assignee: Jinho Kim
> Priority: Critical
> Fix For: 0.11.0, 0.10.1
>
>
> Currently, hash shuffle report always send to stage. If a worker ran all task
> too fast, other worker will be received shouldDie message, and it does not
> executed any task. but report will be sent.
> Additionally, a case of range shuffle is not need hash shuffle report. It is
> just unnecessary waiting
> {noformat}
> 2015-04-16 02:05:49,063 INFO org.apache.tajo.querymaster.Stage: Stage
> finalize - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
> 2015-04-16 02:05:49,063 INFO
> org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler
> schedulingThread stopped
> 2015-04-16 02:05:49,064 INFO
> org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.querymaster.QueryMaster: cleanup
> executionBlocks:
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Received
> ShouldDie
> flag:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Stop
> TaskRunner:
> eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunnerManager: Stop
> Task:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,065 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000001, waiting for shuffle reports. expected Tasks:3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager:
> ======================== Processing eb_1429088098190_1356_000001 of type STOP
> 2015-04-16 02:05:49,066 INFO
> org.apache.tajo.storage.HashShuffleAppenderManager: Close
> HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
> 2015-04-16 02:05:49,066 INFO
> org.apache.tajo.storage.HashShuffleAppenderManager: Close
> HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager:
> Stopped execution block:eb_1429088098190_1356_000001
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000001, Received shuffle report: 2/3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000001, Finalized shuffle reports: 3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: Stage
> completed - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Processing
> q_1429088098190_1356 of type STAGE_COMPLETED
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000002, Outer volume: 0.0MB, Inner volume: 1.0MB
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000002, Bigger Table's volume is approximately 1 MB
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> eb_1429088098190_1356_000002, The determined number of join partitions is 1
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage:
> org.apache.tajo.querymaster.DefaultTaskScheduler is chosen for the task
> scheduling for eb_1429088098190_1356_000002
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Scheduling
> Stage:eb_1429088098190_1356_000002
> 2015-04-16 02:05:49,068 INFO org.apache.tajo.storage.FileStorageManager:
> Total input paths to process : 11
> 2015-04-16 02:05:49,068 ERROR org.apache.tajo.querymaster.Stage: Can't handle
> this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:SUCCEEDED,
> nextState:SUCCEEDED
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> SQ_SHUFFLE_REPORT at SUCCEEDED
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tajo.querymaster.Stage.handle(Stage.java:743)
> at
> org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:226)
> at
> org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:220)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-16 02:05:49,068 INFO org.apache.tajo.querymaster.QueryMaster: cleanup
> executionBlocks:
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing
> q_1429088098190_1356 of type STAGE_COMPLETED
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing
> q_1429088098190_1356 of type QUERY_COMPLETED
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query:
> q_1429088098190_1356 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)