Jihoon Son created TAJO-1875:
--------------------------------

             Summary: Resource leak after a query failure
                 Key: TAJO-1875
                 URL: https://issues.apache.org/jira/browse/TAJO-1875
             Project: Tajo
          Issue Type: Bug
          Components: Scheduler
            Reporter: Jihoon Son


I'm testing the join on hdfs and JDBC storage, and found that the worker 
resource is not collected properly after a query failure.

Here is the log.
{noformat}
2015-09-23 01:22:36,776 INFO org.apache.tajo.worker.TaskImpl: Initializing: 
ta_1442938691159_0002_000002_000000_02
2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Stage 
ta_1442938691159_0002_000002_000000_02 is initialized
2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: 
true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: 
file:/disk2/tajo-tmp/q_1442938691159_0002/output/2/0_2
2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,778 INFO org.apache.tajo.storage.jdbc.JdbcScanner: 
Generated SQL: SELECT o_orderkey,o_custkey,o_orderdate,o_shippriority FROM 
orders WHERE o.o_orderdate < DATE '1995-03-15'
2015-09-23 01:22:36,783 ERROR org.apache.tajo.worker.TaskImpl: internal error: 
ERROR: missing FROM-clause entry for table "o"
  Position: 74
org.apache.tajo.exception.TajoInternalError: internal error: ERROR: missing 
FROM-clause entry for table "o"
  Position: 74
        at 
org.apache.tajo.storage.jdbc.JdbcScanner.executeQueryAndGetIter(JdbcScanner.java:263)
        at org.apache.tajo.storage.jdbc.JdbcScanner.next(JdbcScanner.java:116)
        at 
org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39)
        at 
org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:292)
        at 
org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:102)
        at org.apache.tajo.worker.TaskImpl.run(TaskImpl.java:405)
        at org.apache.tajo.worker.TaskContainer.run(TaskContainer.java:65)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-09-23 01:22:36,783 INFO org.apache.tajo.worker.TaskImpl: 
ta_1442938691159_0002_000002_000000_02 completed. Worker's task counter - 
total:3, succeeded: 0, killed: 0, failed: 3
2015-09-23 01:22:36,784 ERROR org.apache.tajo.querymaster.TaskAttempt: 
ta_1442938691159_0002_000002_000000_02 FROM jihoon-desktop >> internal error: 
ERROR: missing FROM-clause entry for table "o"
  Position: 74
2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: 
====================================================================================
2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: >>> Task Failed: 
ta_1442938691159_0002_000002_000000_02, retry:false, attempts:3 <<<
2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: 
====================================================================================
2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
TaskScheduler schedulingThread stopped
2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Task Scheduler stopped
2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Stage: Stage completed 
- eb_1442938691159_0002_000002 (total=1, success=0, killed=0)
2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Task Scheduler stopped
2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Processing 
q_1442938691159_0002 of type STAGE_COMPLETED
2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Complete 
Stage[eb_1442938691159_0002_000002], State: FAILED, 1/2.
2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 1
2015-09-23 01:22:36,785 INFO 
org.apache.tajo.storage.HashShuffleAppenderManager: Close 
HashShuffleAppender:eb_1442938691159_0002_000002, not a hash shuffle
2015-09-23 01:22:36,786 ERROR org.apache.tajo.querymaster.Stage: Can't handle 
this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:FAILED, 
nextState:FAILED
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
SQ_SHUFFLE_REPORT at FAILED
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tajo.querymaster.Stage.handle(Stage.java:797)
        at 
org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:210)
        at 
org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:204)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
2015-09-23 01:22:36,786 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Task Scheduler stopped
2015-09-23 01:22:36,786 INFO org.apache.tajo.worker.TaskManager: Stopped 
execution block:eb_1442938691159_0002_000002
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing 
q_1442938691159_0002 of type STAGE_COMPLETED
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Complete 
Stage[eb_1442938691159_0002_000002], State: ERROR, 2/2.
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing 
q_1442938691159_0002 of type QUERY_COMPLETED
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: 
q_1442938691159_0002 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.QueryMasterTask: Query 
completion notified from q_1442938691159_0002 final state: QUERY_ERROR
2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Task basedir is 
created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 2
2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Initializing: 
ta_1442938691159_0002_000001_000002_00
2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,788 INFO org.apache.tajo.querymaster.QueryMasterTask: 
Stopping QueryMasterTask:q_1442938691159_0002
2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: * Stage 
ta_1442938691159_0002_000001_000002_00 is initialized
2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: 
true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: 
file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/2_0
2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: 
Cleanup resources of all workers. Query: q_1442938691159_0002, workers: 1
2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: 
Stopped QueryMasterTask:q_1442938691159_0002
2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Task basedir is 
created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Initializing: 
ta_1442938691159_0002_000001_000003_00
2015-09-23 01:22:36,792 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 3
2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Stage 
ta_1442938691159_0002_000001_000003_00 is initialized
2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: 
true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: 
file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/3_0
2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: 
==================================
2015-09-23 01:22:36,795 INFO org.apache.tajo.worker.TaskManager: QueryMaster 
Address:jihoon-desktop:28093
2015-09-23 01:22:36,797 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 4
2015-09-23 01:22:36,798 INFO org.apache.tajo.worker.TaskImpl: 
ta_1442938691159_0002_000001_000002_00 completed. Worker's task counter - 
total:3, succeeded: 2, killed: 0, failed: 1
2015-09-23 01:22:36,798 WARN org.apache.tajo.rpc.NettyClientBase: 
java.nio.channels.ClosedChannelException Try to reconnect 
:jihoon-desktop/127.0.1.1:28093
2015-09-23 01:22:36,801 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: 
Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 5
2015-09-23 01:22:36,992 INFO org.apache.tajo.util.history.HistoryWriter: Saving 
query summary: 
hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/query.hist
2015-09-23 01:22:37,026 INFO org.apache.tajo.util.history.HistoryWriter: Saving 
query unit: 
hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000001.hist
2015-09-23 01:22:37,042 INFO org.apache.tajo.util.history.HistoryWriter: Saving 
query unit: 
hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000002.hist
2015-09-23 01:22:37,559 INFO org.apache.tajo.util.history.HistoryWriter: 
Closing task history file: 
file:/tmp/tajo-jihoon/history/19700101/tasks/jihoon-desktop_28091/jihoon-desktop_28091_09_0.hist
2015-09-23 01:22:37,803 WARN 
org.apache.tajo.querymaster.QueryMasterManagerService: No QueryMasterTask: 
ta_1442938691159_0002_000001_000002_00
2015-09-23 01:22:46,797 FATAL org.apache.tajo.worker.TaskManager:
java.util.concurrent.TimeoutException
        at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:79)
        at 
org.apache.tajo.worker.TaskManager.createExecutionBlock(TaskManager.java:126)
        at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:178)
        at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:54)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to