[ 
https://issues.apache.org/jira/browse/TAJO-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501398#comment-14501398
 ] 

ASF GitHub Bot commented on TAJO-1560:
--------------------------------------

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/538#discussion_r28644583
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java ---
    @@ -1300,6 +1301,52 @@ protected void stopFinalization() {
         stopShuffleReceiver.set(true);
       }
     
    +  private void finalizeShuffleReport(StageShuffleReportEvent event, 
ShuffleType type) {
    +    if(!checkIfNeedFinalizing(type)) return;
    +
    +    TajoWorkerProtocol.ExecutionBlockReport report = event.getReport();
    +
    +    if (!report.getReportSuccess()) {
    +      stopFinalization();
    +      LOG.error(getId() + ", " + type + " report are failed. Caused by:" + 
report.getReportErrorMessage());
    +      eventHandler.handle(new StageEvent(getId(), 
StageEventType.SQ_FAILED));
    +    }
    +
    +    completedShuffleTasks.addAndGet(report.getSucceededTasks());
    +    if (report.getIntermediateEntriesCount() > 0) {
    +      for (IntermediateEntryProto eachInterm : 
report.getIntermediateEntriesList()) {
    +        hashShuffleIntermediateEntries.add(new 
IntermediateEntry(eachInterm));
    +      }
    +    }
    +
    +    if (completedShuffleTasks.get() >= succeededObjectCount) {
    +      LOG.info(getId() + ", Finalized " + type + " reports: " + 
completedShuffleTasks.get());
    +      eventHandler.handle(new StageEvent(getId(), 
StageEventType.SQ_STAGE_COMPLETED));
    +      if (timeoutChecker != null) {
    +        stopFinalization();
    +        synchronized (timeoutChecker){
    +          timeoutChecker.notifyAll();
    +        }
    +      }
    +    } else {
    +      LOG.info(getId() + ", Received " + type + " reports " +
    +          completedShuffleTasks.get() + "/" + succeededObjectCount);
    +    }
    +  }
    +
    +  /**
    +   * HASH_SHUFFLE, SCATTERED_HASH_SHUFFLE should get report from worker 
nodes when ExecutionBlock is stopping.
    --- End diff --
    
    It would be great if you add a comment that describes why we don't need to 
collect reports when the shuffle type is RANGE_SHUFFLE.


> HashShuffle report should be ignored when a succeed tasks are not included
> --------------------------------------------------------------------------
>
>                 Key: TAJO-1560
>                 URL: https://issues.apache.org/jira/browse/TAJO-1560
>             Project: Tajo
>          Issue Type: Bug
>          Components: data shuffle, query master
>    Affects Versions: 0.10.0
>            Reporter: Jinho Kim
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.11.0, 0.10.1
>
>         Attachments: TAJO-1560.patch
>
>
> Currently, hash shuffle report always send to stage. If a worker ran all task 
> too fast, other worker will be received shouldDie message, and it does not 
> executed any task. but report will be sent.
> Additionally, a case of range shuffle is not need hash shuffle report. It is 
> just unnecessary waiting
> {noformat}
> 2015-04-16 02:05:49,063 INFO org.apache.tajo.querymaster.Stage: Stage 
> finalize - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
> 2015-04-16 02:05:49,063 INFO 
> org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler 
> schedulingThread stopped
> 2015-04-16 02:05:49,064 INFO 
> org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.querymaster.QueryMaster: cleanup 
> executionBlocks: 
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Received 
> ShouldDie 
> flag:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Stop 
> TaskRunner: 
> eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunnerManager: Stop 
> Task:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
> 2015-04-16 02:05:49,065 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000001, waiting for shuffle reports. expected Tasks:3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager: 
> ======================== Processing eb_1429088098190_1356_000001 of type STOP
> 2015-04-16 02:05:49,066 INFO 
> org.apache.tajo.storage.HashShuffleAppenderManager: Close 
> HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
> 2015-04-16 02:05:49,066 INFO 
> org.apache.tajo.storage.HashShuffleAppenderManager: Close 
> HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager: 
> Stopped execution block:eb_1429088098190_1356_000001
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000001, Received shuffle report: 2/3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000001, Finalized shuffle reports: 3
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: Stage 
> completed - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Processing 
> q_1429088098190_1356 of type STAGE_COMPLETED
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000002, Outer volume: 0.0MB, Inner volume: 1.0MB
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000002, Bigger Table's volume is approximately 1 MB
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> eb_1429088098190_1356_000002, The determined number of join partitions is 1
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: 
> org.apache.tajo.querymaster.DefaultTaskScheduler is chosen for the task 
> scheduling for eb_1429088098190_1356_000002
> 2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Scheduling 
> Stage:eb_1429088098190_1356_000002
> 2015-04-16 02:05:49,068 INFO org.apache.tajo.storage.FileStorageManager: 
> Total input paths to process : 11
> 2015-04-16 02:05:49,068 ERROR org.apache.tajo.querymaster.Stage: Can't handle 
> this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:SUCCEEDED, 
> nextState:SUCCEEDED
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> SQ_SHUFFLE_REPORT at SUCCEEDED
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at org.apache.tajo.querymaster.Stage.handle(Stage.java:743)
>       at 
> org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:226)
>       at 
> org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:220)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-04-16 02:05:49,068 INFO org.apache.tajo.querymaster.QueryMaster: cleanup 
> executionBlocks: 
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing 
> q_1429088098190_1356 of type STAGE_COMPLETED
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing 
> q_1429088098190_1356 of type QUERY_COMPLETED
> 2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: 
> q_1429088098190_1356 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to