[ 
https://issues.apache.org/jira/browse/GOBBLIN-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari reassigned GOBBLIN-145:
---------------------------------------

       Assignee: Abhishek Tiwari
    Component/s: gobblin-yarn

> Gobblin Yarn fails to shutdown all containers
> ---------------------------------------------
>
>                 Key: GOBBLIN-145
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-145
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-yarn
>            Reporter: Joel Baranick
>            Assignee: Abhishek Tiwari
>              Labels: Bug:Generic, LaunchType:Yarn
>
> While shutting down the application, not all containers are shutdown.  The 
> following are the logs for the failure:
> ```
> 2016-02-02 21:10:05 UTC ERROR [pool-142-thread-1] 
> gobblin.yarn.GobblinApplicationMaster  - Timeout in stopping the service 
> manager
> java.util.concurrent.TimeoutException: Timeout waiting for the services to 
> stop.
>     at 
> com.google.common.util.concurrent.ServiceManager.awaitStopped(ServiceManager.java:338)
>     at 
> gobblin.yarn.GobblinApplicationMaster.stop(GobblinApplicationMaster.java:208)
>     at 
> gobblin.yarn.GobblinApplicationMaster.handleApplicationMasterShutdownRequest(GobblinApplicationMaster.java:287)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at 
> com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
>     at 
> com.google.common.eventbus.SynchronizedEventHandler.handleEvent(SynchronizedEventHandler.java:47)
>     at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
>     at 
> com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
>     at com.google.common.eventbus.EventBus.post(EventBus.java:267)
>     at 
> gobblin.yarn.GobblinApplicationMaster$ControllerShutdownMessageHandlerFactory$ControllerShutdownMessageHandler$1.run(GobblinApplicationMaster.java:445)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down 
> HelixTaskExecutor
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset 
> HelixTaskExecutor
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.monitoring.mbeans.MessageQueueMonitor  - Unregistering 
> ClusterStatus: cluster=GobblinYarn,messageQueue=GobblinApplicationMaster
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: CONTROLLER_MSG, pool: 
> java.util.concurrent.ThreadPoolExecutor@7394d9eb[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@7394d9eb[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: SCHEDULER_MSG, pool: 
> java.util.concurrent.ThreadPoolExecutor@316f0e95[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@316f0e95[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: PARTICIPANT_ERROR_REPORT, pool: 
> java.util.concurrent.ThreadPoolExecutor@7226718d[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@7226718d[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: USER_DEFINE_MSG, pool: 
> java.util.concurrent.ThreadPoolExecutor@39e91a87[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@39e91a87[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: SHUTDOWN, pool: 
> java.util.concurrent.ThreadPoolExecutor@76ccb2c4[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@76ccb2c4[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset exectuor for 
> msgType: TASK_REPLY, pool: 
> java.util.concurrent.ThreadPoolExecutor@78ec9eca[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutting down pool: 
> java.util.concurrent.ThreadPoolExecutor@78ec9eca[Running, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Shutdown 
> HelixTaskExecutor finished
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.helix.messaging.handling.HelixTaskExecutor  - Reset 
> HelixTaskExecutor
> 2016-02-02 21:10:05 UTC INFO  
> [ZkClient-EventThread-35-stage.zk.int.data.ensighten.com:2181] 
> org.I0Itec.zkclient.ZkEventThread  - Terminate ZkClient event thread.
> 2016-02-02 21:10:05 UTC INFO  [pool-142-thread-1] 
> org.apache.zookeeper.ZooKeeper  - Session: 0xd451accea96b1186 closed
> 2016-02-02 21:10:05 UTC INFO  [main-EventThread] 
> org.apache.zookeeper.ClientCnxn$EventThread  - EventThread shut down
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454445160766
>  does not exist
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454444218366
>  does not exist
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443560377
>  does not exist
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443440378
>  does not exist
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454446116585
>  does not exist
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-5] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454445160766: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-2] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443440378: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443800384
>  does not exist
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-7] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454446116585: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-6] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443560377: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-3] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454444218366: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-8] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443800384: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443980357
>  does not exist
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443680334
>  does not exist
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-1] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443980357: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-10] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443680334: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443320333
>  does not exist
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-9] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443320333: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC INFO  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Stopping the 
> TaskStateCollectorService
> 2016-02-02 21:10:06 UTC WARN  [TaskStateCollectorService STOPPING] 
> gobblin.runtime.TaskStateCollectorService  - Output task state path 
> hdfs://SERVER.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1454431761767_0015/_taskstates/job_JOB_NAME_1454443920554
>  does not exist
> 2016-02-02 21:10:06 UTC ERROR [DefaultQuartzScheduler_Worker-4] 
> gobblin.runtime.AbstractJobLauncher  - Failed to launch and run job 
> job_JOB_NAME_1454443920554: org.apache.helix.HelixException: HelixManager is 
> not connected. Call HelixManager#connect()
> org.apache.helix.HelixException: HelixManager is not connected. Call 
> HelixManager#connect()
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:292)
>     at 
> org.apache.helix.manager.zk.ZKHelixManager.getHelixPropertyStore(ZKHelixManager.java:644)
>     at org.apache.helix.task.TaskUtil.getWorkflowContext(TaskUtil.java:256)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.waitForJobCompletion(GobblinHelixJobLauncher.java:251)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.runWorkUnits(GobblinHelixJobLauncher.java:151)
>     at 
> gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:251)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:332)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-7] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@61de017e[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-7] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@61de017e[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-2] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@2452b0a3[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-2] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@2452b0a3[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-2] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@29138466[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-2] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@29138466[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-2] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-7] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@34959ee0[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-7] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@34959ee0[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-7] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-8] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@b4292e5[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-6] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@2aa79345[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-8] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@b4292e5[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-6] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@2aa79345[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-8] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@23c27ed1[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-8] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@23c27ed1[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-8] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-6] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@52dcb5f7[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-6] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@52dcb5f7[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-6] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-3] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@328a7eb9[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-3] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@328a7eb9[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-5] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@5a29ab8d[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-5] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@5a29ab8d[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-3] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@1d48efd4[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-3] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@1d48efd4[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-3] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-5] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@534b3f08[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-5] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@534b3f08[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-5] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-10] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@653be4b0[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-10] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@653be4b0[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-9] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@3f2312bc[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-9] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@3f2312bc[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-9] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@b63fa9f[Shutting down, pool size = 1, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-9] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@b63fa9f[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-10] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@50dae290[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-10] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@50dae290[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-9] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:06 UTC INFO  [DefaultQuartzScheduler_Worker-10] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-4] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@1932b09[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-4] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@1932b09[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-4] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@670474d4[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-4] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@670474d4[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-4] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@63753308[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@63753308[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> gobblin.util.ExecutorsUtils  - Attempting to shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@61e4879d[Shutting down, pool size = 
> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> gobblin.util.ExecutorsUtils  - Successfully shutdown ExecutorService: 
> java.util.concurrent.ThreadPoolExecutor@61e4879d[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 1]
> 2016-02-02 21:10:07 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> org.quartz.core.JobRunShell  - Job nexus.JOB_NAME threw a 
> JobExecutionException: 
> org.quartz.JobExecutionException: java.lang.NullPointerException [See nested 
> exception: java.lang.NullPointerException]
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: java.lang.NullPointerException
>     at org.apache.helix.manager.zk.ZkClient$12.call(ZkClient.java:412)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>     at org.apache.helix.manager.zk.ZkClient.asyncExists(ZkClient.java:408)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:938)
>     at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.exists(ZkBaseDataAccessor.java:909)
>     at org.apache.helix.manager.zk.ZKUtil.isClusterSetup(ZKUtil.java:70)
>     at org.apache.helix.ConfigAccessor.getKeys(ConfigAccessor.java:482)
>     at org.apache.helix.task.TaskUtil.getResourceConfigMap(TaskUtil.java:464)
>     at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:108)
>     at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:284)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.executeCancellation(GobblinHelixJobLauncher.java:171)
>     at 
> gobblin.yarn.GobblinHelixJobLauncher.close(GobblinHelixJobLauncher.java:131)
>     at com.google.common.io.Closer.close(Closer.java:214)
>     at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
>     at gobblin.yarn.GobblinHelixJob.execute(GobblinHelixJob.java:56)
>     ... 2 more
> 2016-02-02 21:10:07 UTC INFO  [GobblinHelixJobScheduler STOPPING] 
> org.quartz.core.QuartzScheduler  - Scheduler 
> DefaultQuartzScheduler_$_NON_CLUSTERED shutdown complete.
> ```
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/654 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2016-02-02T21:32:19Z 
> *Github Updated At* : 2017-01-12T04:37:31Z 
> h3. Comments 
> ----
> [~stakiar] wrote on 2016-02-04T23:52:45Z : @kadaan could you send over more 
> of the logs from the AppMaster; specifically, I'm interested in the logs that 
> are printed just before the first exception.
> If you can, a copy of the logs from a container would also be helpful.
> My current theory is the containers are not properly setting their exit 
> status when they get cancelled, which causes a timeout in the AppMaster. Some 
> notes for future reference:
> - The GobblinAppMaster handles all its shutdown logic in a Shutdown Hook
> - If a user executes `yarn job -kill [applicationId]` my guess is that YARN 
> will attempt to shutdown the AppMaster as well as all the containers, in no 
> specific order.
> - The problem on cancellation of a container is that `GobblinHelixTask` does 
> not implement the `cancel` method and thus has no chance of setting its exit 
> status
> - This causes the `GobblinHelixJobLauncher.waitForJobCompletion` method to 
> loop indefinitely because as far as Helix knowns the job has not been 
> cancelled
> - Eventually there is a timeout in `GobblinApplicationMaster.stop` which then 
> causes a disconnection from Helix, which then causes all the `HelixManager is 
> not connected exceptions`
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/654#issuecomment-180109103 
> ----
> [~jbaranick] wrote on 2016-02-18T06:57:36Z : @sahilTakiar Attached additional 
> logs:
> - 
> [application_master_obfuscated.txt](https://github.com/linkedin/gobblin/files/135770/application_master_obfuscated.txt)
> - 
> [container_obfuscated.txt](https://github.com/linkedin/gobblin/files/135771/container_obfuscated.txt)
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/654#issuecomment-185569863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to