[
https://issues.apache.org/jira/browse/TAJO-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-15:
-----------------------------
Attachment: TAJO-15.patch
I've attached the patch. The cause of this bug is that the main thread is
hanging. So, this patch adds explicitly System.exit(0) in the end of the main
method. It works well in both Linux and mac OS X. All unit tests pass.
In addition, this patch cleans up some unused variables and methods.
> The Integration test is getting hanged on Mac OS X.
> ---------------------------------------------------
>
> Key: TAJO-15
> URL: https://issues.apache.org/jira/browse/TAJO-15
> Project: Tajo
> Issue Type: Bug
> Environment: OS: Mac 10.8.3
> Both JVMs:
> {noformat}
> java version "1.6.0_43"
> Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
> Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
> {noformat}
> {noformat}
> java version "1.7.0_10"
> Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
> Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
> {noformat}
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Attachments: TAJO-15.patch
>
>
> The Integration test is getting hanged on Mac OS X. The below is the unit
> test logs reported by Ashish.
> http://markmail.org/message/lknrqecc27v4thbb
> {noformat}
> 2013-03-28 16:42:39,039 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(776)) - Application
> appattempt_1364469093530_0002_000001 released container
> container_1364469093530_0002_01_000007 on node: host: a.b.c.d:60941
> #containers=0 available=4096 used=0 with event: FINISHED
> 2013-03-28 16:42:39,235 INFO rmcontainer.RMContainerImpl
> (RMContainerImpl.java:handle(220)) - container_1364469093530_0002_01_000008
> Container Transitioned from ALLOCATED to ACQUIRED
> 2013-03-28 16:42:39,236 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 1
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(175)) -
> ================================================================
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(177)) - > Container Id:
> container_1364469093530_0002_01_000008
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(178)) - > Node Id:
> a.b.c.d:60945
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(179)) - > Resource (Mem): 3072
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(180)) - > State : NEW
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(181)) - > Priority: 92
> 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(183)) -
> ================================================================
> 2013-03-28 16:42:39,238 INFO master.SubQuery
> (SubQuery.java:transition(713)) - SubQuery
> (sq_1364469093530_0002_000001_27) has 1 containers!
> 2013-03-28 16:42:39,238 INFO master.TaskRunnerLauncherImpl
> (TaskRunnerLauncherImpl.java:launch(393)) - Launching Container with Id:
> container_1364469093530_0002_01_000008
> 2013-03-28 16:42:39,239 INFO master.TaskRunnerLauncherImpl
> (TaskRunnerLauncherImpl.java:createContainerLaunchContext(301)) - Completed
> setting up taskrunner command ${JAVA_HOME}/bin/java -Xmx2000m
> tajo.worker.TaskRunner a.b.c.d 58243 sq_1364469093530_0002_000001_27
> a.b.c.d:60945 container_1364469093530_0002_01_000008 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
> 2013-03-28 16:42:39,244 INFO containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:startContainer(402)) - Start request for
> container_1364469093530_0002_01_000008 by user xxxxxxx
> 2013-03-28 16:42:39,245 INFO nodemanager.NMAuditLogger
> (NMAuditLogger.java:logSuccess(89)) - USER=xxxxxxx IP=a.b.c.d OPERATION=Start
> Container Request TARGET=ContainerManageImpl RESULT=SUCCESS
> APPID=application_1364469093530_0002
> CONTAINERID=container_1364469093530_0002_01_000008
> 2013-03-28 16:42:39,245 INFO application.Application
> (ApplicationImpl.java:transition(255)) - Adding
> container_1364469093530_0002_01_000008 to application
> application_1364469093530_0002
> 2013-03-28 16:42:39,246 INFO container.Container
> (ContainerImpl.java:handle(835)) - Container
> container_1364469093530_0002_01_000008 transitioned from NEW to LOCALIZING
> 2013-03-28 16:42:39,246 INFO master.TaskRunnerLauncherImpl
> (TaskRunnerLauncherImpl.java:launch(424)) - PullServer port returned by
> ContainerManager for container_1364469093530_0002_01_000008 : 60947
> 2013-03-28 16:42:39,246 INFO containermanager.AuxServices
> (AuxServices.java:handle(160)) - Got event APPLICATION_INIT for appId
> application_1364469093530_0002
> 2013-03-28 16:42:39,246 INFO containermanager.AuxServices
> (AuxServices.java:handle(164)) - Got APPLICATION_INIT for service
> tajo.pullserver
> 2013-03-28 16:42:39,246 INFO master.Query (Query.java:handle(514)) -
> Processing q_1364469093530_0002_000001 of type INIT_COMPLETED
> 2013-03-28 16:42:39,246 INFO container.Container
> (ContainerImpl.java:handle(835)) - Container
> container_1364469093530_0002_01_000008 transitioned from LOCALIZING to
> LOCALIZED
> 2013-03-28 16:42:39,247 INFO util.RackResolver
> (RackResolver.java:coreResolve(100)) - Resolved L-IDC77TDV7M-M.local to
> /default-rack
> 2013-03-28 16:42:39,339 INFO container.Container
> (ContainerImpl.java:handle(835)) - Container
> container_1364469093530_0002_01_000008 transitioned from LOCALIZED to
> RUNNING
> 2013-03-28 16:42:39,340 INFO monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isEnabled(168)) - ResourceCalculatorPlugin is
> unavailable on this system.
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> is disabled.
> 2013-03-28 16:42:39,535 INFO nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:launchContainer(175)) - launchContainer:
> [bash,
> /Users/xxxxxxx/opensource/tajo/incubator-tajo/tajo-core/tajo-core-backend/target/tajo.TajoTestingCluster/tajo.TajoTestingCluster-localDir-nm-1_0/usercache/xxxxxxx/appcache/application_1364469093530_0002/container_1364469093530_0002_01_000008/default_container_executor.sh]
> 2013-03-28 16:42:39,903 INFO nodemanager.NodeStatusUpdaterImpl
> (NodeStatusUpdaterImpl.java:getNodeStatus(265)) - Sending out status for
> container: container_id {, app_attempt_id {, application_id {, id: 2,
> cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state:
> C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-03-28 16:42:39,904 INFO rmcontainer.RMContainerImpl
> (RMContainerImpl.java:handle(220)) - container_1364469093530_0002_01_000008
> Container Transitioned from ACQUIRED to RUNNING
> 2013-03-28 16:42:40,020 WARN nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:launchContainer(193)) - Exit code from task
> is : 1
> 2013-03-28 16:42:40,021 INFO nodemanager.ContainerExecutor
> (ContainerExecutor.java:logOutput(167)) -
> 2013-03-28 16:42:40,021 WARN launcher.ContainerLaunch
> (ContainerLaunch.java:call(274)) - Container exited with a non-zero exit
> code 1
> 2013-03-28 16:42:40,021 INFO container.Container
> (ContainerImpl.java:handle(835)) - Container
> container_1364469093530_0002_01_000008 transitioned from RUNNING to
> EXITED_WITH_FAILURE
> 2013-03-28 16:42:40,021 INFO launcher.ContainerLaunch
> (ContainerLaunch.java:cleanupContainer(300)) - Cleaning up container
> container_1364469093530_0002_01_000008
> 2013-03-28 16:42:40,040 INFO nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:deleteAsUser(273)) - Deleting absolute path
> :
> /Users/xxxxxxx/opensource/tajo/incubator-tajo/tajo-core/tajo-core-backend/target/tajo.TajoTestingCluster/tajo.TajoTestingCluster-localDir-nm-1_0/usercache/xxxxxxx/appcache/application_1364469093530_0002/container_1364469093530_0002_01_000008
> 2013-03-28 16:42:40,040 WARN nodemanager.NMAuditLogger
> (NMAuditLogger.java:logFailure(150)) - USER=xxxxxxx OPERATION=Container
> Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container
> failed with state: EXITED_WITH_FAILURE APPID=application_1364469093530_0002
> CONTAINERID=container_1364469093530_0002_01_000008
> 2013-03-28 16:42:40,041 INFO container.Container
> (ContainerImpl.java:handle(835)) - Container
> container_1364469093530_0002_01_000008 transitioned from
> EXITED_WITH_FAILURE to DONE
> 2013-03-28 16:42:40,041 INFO application.Application
> (ApplicationImpl.java:transition(298)) - Removing
> container_1364469093530_0002_01_000008 from application
> application_1364469093530_0002
> 2013-03-28 16:42:40,041 INFO monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isEnabled(168)) - ResourceCalculatorPlugin is
> unavailable on this system.
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> is disabled.
> 2013-03-28 16:42:40,241 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:40,241 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:40,905 INFO nodemanager.NodeStatusUpdaterImpl
> (NodeStatusUpdaterImpl.java:getNodeStatus(265)) - Sending out status for
> container: container_id {, app_attempt_id {, application_id {, id: 2,
> cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state:
> C_COMPLETE, diagnostics: "\n", exit_status: 1,
> 2013-03-28 16:42:40,905 INFO nodemanager.NodeStatusUpdaterImpl
> (NodeStatusUpdaterImpl.java:getNodeStatus(271)) - Removed completed
> container container_1364469093530_0002_01_000008
> 2013-03-28 16:42:40,906 INFO rmcontainer.RMContainerImpl
> (RMContainerImpl.java:handle(220)) - container_1364469093530_0002_01_000008
> Container Transitioned from RUNNING to COMPLETED
> 2013-03-28 16:42:40,906 INFO fica.FiCaSchedulerApp
> (FiCaSchedulerApp.java:containerCompleted(219)) - Completed container:
> container_1364469093530_0002_01_000008 in state: COMPLETED event:FINISHED
> 2013-03-28 16:42:40,906 INFO resourcemanager.RMAuditLogger
> (RMAuditLogger.java:logSuccess(98)) - USER=xxxxxxx OPERATION=AM Released
> Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1364469093530_0002
> CONTAINERID=container_1364469093530_0002_01_000008
> 2013-03-28 16:42:40,906 INFO fica.FiCaSchedulerNode
> (FiCaSchedulerNode.java:releaseContainer(150)) - Released container
> container_1364469093530_0002_01_000008 of capacity <memory:3072, vCores:1>
> on host a.b.c.d:60945, which currently has 0 containers, <memory:0,
> vCores:0> used and <memory:4096, vCores:16> available, release
> resources=true
> 2013-03-28 16:42:40,906 INFO capacity.LeafQueue
> (LeafQueue.java:releaseResource(1441)) - default used=<memory:0, vCores:0>
> numContainers=0 user=xxxxxxx user-resources=<memory:0, vCores:0>
> 2013-03-28 16:42:40,907 INFO capacity.LeafQueue
> (LeafQueue.java:completedContainer(1385)) - completedContainer
> container=Container: [ContainerId: container_1364469093530_0002_01_000008,
> NodeId: a.b.c.d:60945, NodeHttpAddress: a.b.c.d:60948, Resource:
> <memory:3072, vCores:1>, Priority: 92, State: NEW, Token: null, Status:
> container_id {, app_attempt_id {, application_id {, id: 2,
> cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state:
> C_COMPLETE, diagnostics: "\n", exit_status: 1, ] resource=<memory:3072,
> vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
> usedResources=<memory:0, vCores:0>usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0 usedCapacity=0.0
> absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:12288,
> vCores:48>
> 2013-03-28 16:42:40,907 INFO capacity.ParentQueue
> (ParentQueue.java:completedContainer(696)) - completedContainer queue=root
> usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
> cluster=<memory:12288, vCores:48>
> 2013-03-28 16:42:40,907 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(776)) - Application
> appattempt_1364469093530_0002_000001 released container
> container_1364469093530_0002_01_000008 on node: host: a.b.c.d:60945
> #containers=0 available=4096 used=0 with event: FINISHED
> 2013-03-28 16:42:41,242 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:41,242 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:42,245 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:42,246 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:43,248 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:43,249 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:44,251 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:44,252 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:45,255 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:45,256 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:46,259 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:46,260 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:47,263 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> 2013-03-28 16:42:47,264 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated
> Containers: 0
> 2013-03-28 16:42:48,267 INFO rm.RMContainerAllocator
> (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource:
> <memory:6144, vCores:-1>
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira