Hi Junkai, There is a cleanup agent [1] who is monitoring currently available workflows and deleting completed and failed workflows to clear up zookeeper storage. Do you think that this will be causing this issue?
[1] https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java Thanks Dimuthu On Fri, Nov 9, 2018 at 11:14 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com> wrote: > Hi Junkai, > > There is no manual workflow killing logic implemented but as you have > suggested, I need to verify that. Unfortunately all the helix log levels in > our servers were set to WARN as helix is printing a whole lot of logs in > INFO level so there is no much valuable information in logs. Can you > specify which class is printing logs associated for workflow termination > and I'll enable DEBUG level for that class and observe further. > > Thanks > Dimuthu > > On Fri, Nov 9, 2018 at 9:18 PM Xue Junkai <junkai....@gmail.com> wrote: > >> Hmm, that's very strange. The user content store znode only has been >> deleted when the workflow is gone. From the log, it shows the znode is >> gone. Could you please try to dig the log to find whether the workflow has >> been manually killed? If that's the case, then it is possible you have the >> problem. >> >> On Fri, Nov 9, 2018 at 12:13 PM DImuthu Upeksha < >> dimuthu.upeks...@gmail.com> >> wrote: >> >> > Hi Junkai, >> > >> > Thanks for your suggestion. You have captured most of the parts >> correctly. >> > There are two jobs as job1 and job2. And there is a dependency that job2 >> > depends on job1. Until job1 is completed job2 should not be scheduled. >> And >> > task 1 in job 1 is calling that method and it is not updating anyone's >> > content. It's just putting and value in workflow level. What do you >> mean my >> > keeping a key-value store in workflow level? I already use that key >> value >> > store given by helix by calling putUserContent method. >> > >> > public void sendNextJob(String jobId) { >> > putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW); >> > if (jobId != null) { >> > putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW); >> > } >> > } >> > >> > Dimuthu >> > >> > >> > On Fri, Nov 9, 2018 at 2:48 PM Xue Junkai <junkai....@gmail.com> wrote: >> > >> > > In my understanding, it could be you have job1 and job2. The task >> running >> > > in job1 tries to update content for job2. Then, there could be a race >> > > condition happening here that job2 is not scheduled. >> > > >> > > If that's the case, I suggest you can put key-value store at workflow >> > level >> > > since this is cross-job operation. >> > > >> > > Best, >> > > >> > > Junkai >> > > >> > > On Fri, Nov 9, 2018 at 11:45 AM DImuthu Upeksha < >> > > dimuthu.upeks...@gmail.com> >> > > wrote: >> > > >> > > > Hi Junkai, >> > > > >> > > > This method is being called inside a running task. And it is working >> > for >> > > > most of the time. I only saw this in 2 occasions for last few months >> > and >> > > > both of them happened today and yesterday. >> > > > >> > > > Thanks >> > > > Dimuthu >> > > > >> > > > On Fri, Nov 9, 2018 at 2:40 PM Xue Junkai <junkai....@gmail.com> >> > wrote: >> > > > >> > > > > User content store node will be created one the job has been >> > scheduled. >> > > > In >> > > > > your case, I think the job is not scheduled. This method usually >> has >> > > been >> > > > > utilized in running task. >> > > > > >> > > > > Best, >> > > > > >> > > > > Junkai >> > > > > >> > > > > On Fri, Nov 9, 2018 at 8:19 AM DImuthu Upeksha < >> > > > dimuthu.upeks...@gmail.com >> > > > > > >> > > > > wrote: >> > > > > >> > > > > > Hi Helix Folks, >> > > > > > >> > > > > > I'm having this sporadic issue in some tasks of our workflows >> when >> > we >> > > > try >> > > > > > to store a value in the workflow context and I have added both >> code >> > > > > section >> > > > > > and error message below. Do you have an idea what's causing >> this? >> > > > Please >> > > > > > let me know if you need further information. We are using Helix >> > 0.8.2 >> > > > > > >> > > > > > public void sendNextJob(String jobId) { >> > > > > > putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW); >> > > > > > if (jobId != null) { >> > > > > > putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW); >> > > > > > } >> > > > > > } >> > > > > > >> > > > > > Failed to setup environment of task >> > > > > > TASK_55096de4-2cb6-4b09-84fd-7fdddba93435 >> > > > > > java.lang.NullPointerException: null >> > > > > > at >> > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:358) >> > > > > > at >> > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:356) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.helix.manager.zk.HelixGroupCommit.commit(HelixGroupCommit.java:126) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.update(ZkCacheBaseDataAccessor.java:306) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.helix.store.zk.AutoFallbackPropertyStore.update(AutoFallbackPropertyStore.java:61) >> > > > > > at >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.helix.task.TaskUtil.addWorkflowJobUserContent(TaskUtil.java:356) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.helix.task.UserContentStore.putUserContent(UserContentStore.java:78) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.airavata.helix.core.AbstractTask.sendNextJob(AbstractTask.java:136) >> > > > > > at >> > > > org.apache.airavata.helix.core.OutPort.invoke(OutPort.java:42) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.airavata.helix.core.AbstractTask.onSuccess(AbstractTask.java:123) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.airavata.helix.impl.task.AiravataTask.onSuccess(AiravataTask.java:97) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.airavata.helix.impl.task.env.EnvSetupTask.onRun(EnvSetupTask.java:52) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:349) >> > > > > > at >> > > > > > >> > org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:92) >> > > > > > at >> org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) >> > > > > > at >> > > > > > >> > > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> > > > > > at >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> > > > > > at >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> > > > > > at java.lang.Thread.run(Thread.java:748) >> > > > > > >> > > > > > Thanks >> > > > > > Dimuthu >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Junkai Xue >> > > > > >> > > > >> > > >> > > >> > > -- >> > > Junkai Xue >> > > >> > >> >> >> -- >> Junkai Xue >> >