Hey guys fascinating discussion and thanks for the explanation Hyunsik. I am training on a new job right now and swamped, but I'm excited to get breather and take a look at this interesting problem.
As for the code: its really clean and great, nice work everyone! I'm really impressed at the clarity, its easy to read! There are little things to fix in any codebase, we'll get it done! So this is interesting to me espeically since the App master is launching different task runners for the subqueries and managing their lifecycles. That definitely adds a layer of interesting complexity to the app master's job. I will take a deeper look and see if I notice anything relating to TAJO-26 that might be helpful, when I get the chance. On Wed, Apr 3, 2013 at 9:54 PM, Tanujit Ghosh (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/TAJO-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621648#comment-13621648] > > Tanujit Ghosh commented on TAJO-15: > ----------------------------------- > > Hi, > > I have raised TAJO-26 issue, the environment i'm on is fedora 17 (linux > kernel 3.8.4), sun java 1.6.0_41. > > Yes i'm running mvn verify from the shell. > > From what i see in the log, there is an error with the data file not being > found, maybe i have missed some setting which needs to be done. > > > > > > > -- > Regards, > Tanujit > > > > The Integration test is getting hanged on Mac OS X. > > --------------------------------------------------- > > > > Key: TAJO-15 > > URL: https://issues.apache.org/jira/browse/TAJO-15 > > Project: Tajo > > Issue Type: Bug > > Environment: OS: Mac 10.8.3 > > Both JVMs: > > {noformat} > > java version "1.6.0_43" > > Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203) > > Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode) > > {noformat} > > {noformat} > > java version "1.7.0_10" > > Java(TM) SE Runtime Environment (build 1.7.0_10-b18) > > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) > > {noformat} > > Reporter: Hyunsik Choi > > Assignee: Hyunsik Choi > > Fix For: 0.2-incubating > > > > Attachments: TAJO-15.patch > > > > > > The Integration test is getting hanged on Mac OS X. The below is the > unit test logs reported by Ashish. > > http://markmail.org/message/lknrqecc27v4thbb > > {noformat} > > 2013-03-28 16:42:39,039 INFO capacity.CapacityScheduler > > (CapacityScheduler.java:completedContainer(776)) - Application > > appattempt_1364469093530_0002_000001 released container > > container_1364469093530_0002_01_000007 on node: host: a.b.c.d:60941 > > #containers=0 available=4096 used=0 with event: FINISHED > > 2013-03-28 16:42:39,235 INFO rmcontainer.RMContainerImpl > > (RMContainerImpl.java:handle(220)) - > container_1364469093530_0002_01_000008 > > Container Transitioned from ALLOCATED to ACQUIRED > > 2013-03-28 16:42:39,236 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 1 > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(175)) - > > ================================================================ > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(177)) - > Container Id: > > container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(178)) - > Node Id: > > a.b.c.d:60945 > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(179)) - > Resource (Mem): > 3072 > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(180)) - > State : NEW > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(181)) - > Priority: 92 > > 2013-03-28 16:42:39,237 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(183)) - > > ================================================================ > > 2013-03-28 16:42:39,238 INFO master.SubQuery > > (SubQuery.java:transition(713)) - SubQuery > > (sq_1364469093530_0002_000001_27) has 1 containers! > > 2013-03-28 16:42:39,238 INFO master.TaskRunnerLauncherImpl > > (TaskRunnerLauncherImpl.java:launch(393)) - Launching Container with Id: > > container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:39,239 INFO master.TaskRunnerLauncherImpl > > (TaskRunnerLauncherImpl.java:createContainerLaunchContext(301)) - > Completed > > setting up taskrunner command ${JAVA_HOME}/bin/java -Xmx2000m > > tajo.worker.TaskRunner a.b.c.d 58243 sq_1364469093530_0002_000001_27 > > a.b.c.d:60945 container_1364469093530_0002_01_000008 1><LOG_DIR>/stdout > > 2><LOG_DIR>/stderr > > 2013-03-28 16:42:39,244 INFO containermanager.ContainerManagerImpl > > (ContainerManagerImpl.java:startContainer(402)) - Start request for > > container_1364469093530_0002_01_000008 by user xxxxxxx > > 2013-03-28 16:42:39,245 INFO nodemanager.NMAuditLogger > > (NMAuditLogger.java:logSuccess(89)) - USER=xxxxxxx IP=a.b.c.d > OPERATION=Start > > Container Request TARGET=ContainerManageImpl RESULT=SUCCESS > > APPID=application_1364469093530_0002 > > CONTAINERID=container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:39,245 INFO application.Application > > (ApplicationImpl.java:transition(255)) - Adding > > container_1364469093530_0002_01_000008 to application > > application_1364469093530_0002 > > 2013-03-28 16:42:39,246 INFO container.Container > > (ContainerImpl.java:handle(835)) - Container > > container_1364469093530_0002_01_000008 transitioned from NEW to > LOCALIZING > > 2013-03-28 16:42:39,246 INFO master.TaskRunnerLauncherImpl > > (TaskRunnerLauncherImpl.java:launch(424)) - PullServer port returned by > > ContainerManager for container_1364469093530_0002_01_000008 : 60947 > > 2013-03-28 16:42:39,246 INFO containermanager.AuxServices > > (AuxServices.java:handle(160)) - Got event APPLICATION_INIT for appId > > application_1364469093530_0002 > > 2013-03-28 16:42:39,246 INFO containermanager.AuxServices > > (AuxServices.java:handle(164)) - Got APPLICATION_INIT for service > > tajo.pullserver > > 2013-03-28 16:42:39,246 INFO master.Query (Query.java:handle(514)) - > > Processing q_1364469093530_0002_000001 of type INIT_COMPLETED > > 2013-03-28 16:42:39,246 INFO container.Container > > (ContainerImpl.java:handle(835)) - Container > > container_1364469093530_0002_01_000008 transitioned from LOCALIZING to > > LOCALIZED > > 2013-03-28 16:42:39,247 INFO util.RackResolver > > (RackResolver.java:coreResolve(100)) - Resolved L-IDC77TDV7M-M.local to > > /default-rack > > 2013-03-28 16:42:39,339 INFO container.Container > > (ContainerImpl.java:handle(835)) - Container > > container_1364469093530_0002_01_000008 transitioned from LOCALIZED to > > RUNNING > > 2013-03-28 16:42:39,340 INFO monitor.ContainersMonitorImpl > > (ContainersMonitorImpl.java:isEnabled(168)) - ResourceCalculatorPlugin is > > unavailable on this system. > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > > is disabled. > > 2013-03-28 16:42:39,535 INFO nodemanager.DefaultContainerExecutor > > (DefaultContainerExecutor.java:launchContainer(175)) - launchContainer: > > [bash, > > > /Users/xxxxxxx/opensource/tajo/incubator-tajo/tajo-core/tajo-core-backend/target/tajo.TajoTestingCluster/tajo.TajoTestingCluster-localDir-nm-1_0/usercache/xxxxxxx/appcache/application_1364469093530_0002/container_1364469093530_0002_01_000008/default_container_executor.sh] > > 2013-03-28 16:42:39,903 INFO nodemanager.NodeStatusUpdaterImpl > > (NodeStatusUpdaterImpl.java:getNodeStatus(265)) - Sending out status for > > container: container_id {, app_attempt_id {, application_id {, id: 2, > > cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state: > > C_RUNNING, diagnostics: "", exit_status: -1000, > > 2013-03-28 16:42:39,904 INFO rmcontainer.RMContainerImpl > > (RMContainerImpl.java:handle(220)) - > container_1364469093530_0002_01_000008 > > Container Transitioned from ACQUIRED to RUNNING > > 2013-03-28 16:42:40,020 WARN nodemanager.DefaultContainerExecutor > > (DefaultContainerExecutor.java:launchContainer(193)) - Exit code from > task > > is : 1 > > 2013-03-28 16:42:40,021 INFO nodemanager.ContainerExecutor > > (ContainerExecutor.java:logOutput(167)) - > > 2013-03-28 16:42:40,021 WARN launcher.ContainerLaunch > > (ContainerLaunch.java:call(274)) - Container exited with a non-zero exit > > code 1 > > 2013-03-28 16:42:40,021 INFO container.Container > > (ContainerImpl.java:handle(835)) - Container > > container_1364469093530_0002_01_000008 transitioned from RUNNING to > > EXITED_WITH_FAILURE > > 2013-03-28 16:42:40,021 INFO launcher.ContainerLaunch > > (ContainerLaunch.java:cleanupContainer(300)) - Cleaning up container > > container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:40,040 INFO nodemanager.DefaultContainerExecutor > > (DefaultContainerExecutor.java:deleteAsUser(273)) - Deleting absolute > path > > : > > > /Users/xxxxxxx/opensource/tajo/incubator-tajo/tajo-core/tajo-core-backend/target/tajo.TajoTestingCluster/tajo.TajoTestingCluster-localDir-nm-1_0/usercache/xxxxxxx/appcache/application_1364469093530_0002/container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:40,040 WARN nodemanager.NMAuditLogger > > (NMAuditLogger.java:logFailure(150)) - USER=xxxxxxx OPERATION=Container > > Finished - Failed TARGET=ContainerImpl RESULT=FAILURE > DESCRIPTION=Container > > failed with state: EXITED_WITH_FAILURE > APPID=application_1364469093530_0002 > > CONTAINERID=container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:40,041 INFO container.Container > > (ContainerImpl.java:handle(835)) - Container > > container_1364469093530_0002_01_000008 transitioned from > > EXITED_WITH_FAILURE to DONE > > 2013-03-28 16:42:40,041 INFO application.Application > > (ApplicationImpl.java:transition(298)) - Removing > > container_1364469093530_0002_01_000008 from application > > application_1364469093530_0002 > > 2013-03-28 16:42:40,041 INFO monitor.ContainersMonitorImpl > > (ContainersMonitorImpl.java:isEnabled(168)) - ResourceCalculatorPlugin is > > unavailable on this system. > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > > is disabled. > > 2013-03-28 16:42:40,241 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:40,241 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:40,905 INFO nodemanager.NodeStatusUpdaterImpl > > (NodeStatusUpdaterImpl.java:getNodeStatus(265)) - Sending out status for > > container: container_id {, app_attempt_id {, application_id {, id: 2, > > cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state: > > C_COMPLETE, diagnostics: "\n", exit_status: 1, > > 2013-03-28 16:42:40,905 INFO nodemanager.NodeStatusUpdaterImpl > > (NodeStatusUpdaterImpl.java:getNodeStatus(271)) - Removed completed > > container container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:40,906 INFO rmcontainer.RMContainerImpl > > (RMContainerImpl.java:handle(220)) - > container_1364469093530_0002_01_000008 > > Container Transitioned from RUNNING to COMPLETED > > 2013-03-28 16:42:40,906 INFO fica.FiCaSchedulerApp > > (FiCaSchedulerApp.java:containerCompleted(219)) - Completed container: > > container_1364469093530_0002_01_000008 in state: COMPLETED event:FINISHED > > 2013-03-28 16:42:40,906 INFO resourcemanager.RMAuditLogger > > (RMAuditLogger.java:logSuccess(98)) - USER=xxxxxxx OPERATION=AM Released > > Container TARGET=SchedulerApp RESULT=SUCCESS > > APPID=application_1364469093530_0002 > > CONTAINERID=container_1364469093530_0002_01_000008 > > 2013-03-28 16:42:40,906 INFO fica.FiCaSchedulerNode > > (FiCaSchedulerNode.java:releaseContainer(150)) - Released container > > container_1364469093530_0002_01_000008 of capacity <memory:3072, > vCores:1> > > on host a.b.c.d:60945, which currently has 0 containers, <memory:0, > > vCores:0> used and <memory:4096, vCores:16> available, release > > resources=true > > 2013-03-28 16:42:40,906 INFO capacity.LeafQueue > > (LeafQueue.java:releaseResource(1441)) - default used=<memory:0, > vCores:0> > > numContainers=0 user=xxxxxxx user-resources=<memory:0, vCores:0> > > 2013-03-28 16:42:40,907 INFO capacity.LeafQueue > > (LeafQueue.java:completedContainer(1385)) - completedContainer > > container=Container: [ContainerId: > container_1364469093530_0002_01_000008, > > NodeId: a.b.c.d:60945, NodeHttpAddress: a.b.c.d:60948, Resource: > > <memory:3072, vCores:1>, Priority: 92, State: NEW, Token: null, Status: > > container_id {, app_attempt_id {, application_id {, id: 2, > > cluster_timestamp: 1364469093530, }, attemptId: 1, }, id: 8, }, state: > > C_COMPLETE, diagnostics: "\n", exit_status: 1, ] resource=<memory:3072, > > vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, > > usedResources=<memory:0, vCores:0>usedCapacity=0.0, > > absoluteUsedCapacity=0.0, numApps=1, numContainers=0 usedCapacity=0.0 > > absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:12288, > > vCores:48> > > 2013-03-28 16:42:40,907 INFO capacity.ParentQueue > > (ParentQueue.java:completedContainer(696)) - completedContainer > queue=root > > usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> > > cluster=<memory:12288, vCores:48> > > 2013-03-28 16:42:40,907 INFO capacity.CapacityScheduler > > (CapacityScheduler.java:completedContainer(776)) - Application > > appattempt_1364469093530_0002_000001 released container > > container_1364469093530_0002_01_000008 on node: host: a.b.c.d:60945 > > #containers=0 available=4096 used=0 with event: FINISHED > > 2013-03-28 16:42:41,242 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:41,242 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:42,245 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:42,246 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:43,248 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:43,249 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:44,251 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:44,252 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:45,255 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:45,256 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:46,259 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:46,260 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:47,263 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > 2013-03-28 16:42:47,264 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(173)) - Num of Allocated > > Containers: 0 > > 2013-03-28 16:42:48,267 INFO rm.RMContainerAllocator > > (RMContainerAllocator.java:makeRemoteRequest(172)) - Available Resource: > > <memory:6144, vCores:-1> > > {noformat} > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >
