Eroma created AIRAVATA-2736:
-------------------------------
Summary: Job submitted and running in HPC while the experiment is
tagged as FAILED
Key: AIRAVATA-2736
URL: https://issues.apache.org/jira/browse/AIRAVATA-2736
Project: Airavata
Issue Type: Bug
Components: helix implementation
Affects Versions: 0.18
Environment: http://149.165.168.248:8008/ - Helix test env
Reporter: Eroma
Assignee: Dimuthu Upeksha
Fix For: 0.18
# Submitted an experiment which then submitted the job.
# Job ID is returned and the status is ACTIVE.
# Due to zookeeper connection issue the experiment is FAILED.
# The job is still running in HPC
# Airavata is not waiting for job monitoring as the task status is not updated
in the zookeeper.
# error in log [1]
# SLM001-AmberSander-BR2_5ed5a19f-ab44-4eba-afb7-1feafaf0bbdd - exp ID
[1]
|org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /monitoring/2159926/lock at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778) at
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:696)
at
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:679)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:676)
at
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
at
org.apache.airavata.helix.impl.task.submission.JobSubmissionTask.createMonitoringNode(JobSubmissionTask.java:83)
at
org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:144)
at
org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:264)
at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:74) at
org.apache.helix.task.TaskRunner.run(TaskRunner.java:70) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)