[
https://issues.apache.org/jira/browse/TAJO-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895569#comment-13895569
]
Hyunsik Choi commented on TAJO-587:
-----------------------------------
There may be many rooms for improvement in the method
_scheduleRangeShuffledFetces()_. First of all, we should use just hostname,
several intergers indicating subquery id, task id, and attempt id, instead of
URI. It will significantly reduce the main memory usage.
As an temporary solution, you also can set more memory to TAJO_WORKER_HEAPSIZE.
It would be helpful depending on your environment.
> Query is hanging when OutOfMemoryError occurs in the query master
> -----------------------------------------------------------------
>
> Key: TAJO-587
> URL: https://issues.apache.org/jira/browse/TAJO-587
> Project: Tajo
> Issue Type: Bug
> Components: tajo master
> Reporter: Jihoon Son
> Fix For: 0.8-incubating
>
>
> See the title. When I run a simple sort query against a table of 1TB, the
> query is hanging and not finished.
> {noformat}
> tajo> select l_orderkey from lineitem order by l_orderkey
> 2014-02-05 17:20:52,339 FATAL master.TajoAsyncDispatcher
> (TajoAsyncDispatcher.java:dispatch(143)) - Error in dispatcher
> thread:SUBQUERY_COMPLETED
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.net.URI.create(URI.java:857)
> at
> org.apache.tajo.master.querymaster.Repartitioner.scheduleRangeShuffledFetches(Repartitioner.java:342)
> at
> org.apache.tajo.master.querymaster.Repartitioner.scheduleFragmentsForNonLeafTasks(Repartitioner.java:261)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.schedule(SubQuery.java:680)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:523)
> at
> org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:504)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tajo.master.querymaster.SubQuery.handle(SubQuery.java:481)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.executeNextBlock(Query.java:311)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:357)
> at
> org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:297)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tajo.master.querymaster.Query.handle(Query.java:584)
> at org.apache.tajo.master.querymaster.Query.handle(Query.java:58)
> at
> org.apache.tajo.master.TajoAsyncDispatcher.dispatch(TajoAsyncDispatcher.java:137)
> at
> org.apache.tajo.master.TajoAsyncDispatcher$1.run(TajoAsyncDispatcher.java:79)
> at java.lang.Thread.run(Thread.java:701)
> 2014-02-05 17:20:52,339 WARN querymaster.QueryMaster
> (QueryMaster.java:run(459)) - Query q_1391587770871_0001 stopped cause query
> sesstion timeout: 384113 ms
> 2014-02-05 17:20:52,339 INFO querymaster.QueryMasterTask
> (QueryMasterTask.java:stop(168)) - Stopping
> QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:20:52,346 INFO master.TajoAsyncDispatcher
> (TajoAsyncDispatcher.java:stop(122)) - AsyncDispatcher
> stopped:q_1391587770871_0001
> 2014-02-05 17:20:52,351 INFO querymaster.QueryMasterTask
> (QueryMasterTask.java:stop(198)) - Stopped
> QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:23:28,614 ERROR worker.TajoWorker
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)