Hi Junkai,

There is a cleanup agent [1] who is monitoring currently available
workflows and deleting completed and failed workflows to clear up zookeeper
storage. Do you think that this will be causing this issue?

[1]
https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java

Thanks
Dimuthu

On Fri, Nov 9, 2018 at 11:14 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com>
wrote:

> Hi Junkai,
>
> There is no manual workflow killing logic implemented but as you have
> suggested, I need to verify that. Unfortunately all the helix log levels in
> our servers were set to WARN as helix is printing a whole lot of logs in
> INFO level so there is no much valuable information in logs. Can you
> specify which class is printing logs associated for workflow termination
> and I'll enable DEBUG level for that class and observe further.
>
> Thanks
> Dimuthu
>
> On Fri, Nov 9, 2018 at 9:18 PM Xue Junkai <junkai....@gmail.com> wrote:
>
>> Hmm, that's very strange. The user content store znode only has been
>> deleted when the workflow is gone. From the log, it shows the znode is
>> gone. Could you please try to dig the log to find whether the workflow has
>> been manually killed? If that's the case, then it is possible you have the
>> problem.
>>
>> On Fri, Nov 9, 2018 at 12:13 PM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com>
>> wrote:
>>
>> > Hi Junkai,
>> >
>> > Thanks for your suggestion. You have captured most of the parts
>> correctly.
>> > There are two jobs as job1 and job2. And there is a dependency that job2
>> > depends on job1. Until job1 is completed job2 should not be scheduled.
>> And
>> > task 1 in job 1 is calling that method and it is not updating anyone's
>> > content. It's just putting and value in workflow level. What do you
>> mean my
>> > keeping a key-value store in workflow level? I already use that key
>> value
>> > store given by helix by calling putUserContent method.
>> >
>> > public void sendNextJob(String jobId) {
>> >     putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW);
>> >     if (jobId != null) {
>> >         putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW);
>> >     }
>> > }
>> >
>> > Dimuthu
>> >
>> >
>> > On Fri, Nov 9, 2018 at 2:48 PM Xue Junkai <junkai....@gmail.com> wrote:
>> >
>> > > In my understanding, it could be you have job1 and job2. The task
>> running
>> > > in job1 tries to update content for job2. Then, there could be a race
>> > > condition happening here that job2 is not scheduled.
>> > >
>> > > If that's the case, I suggest you can put key-value store at workflow
>> > level
>> > > since this is cross-job operation.
>> > >
>> > > Best,
>> > >
>> > > Junkai
>> > >
>> > > On Fri, Nov 9, 2018 at 11:45 AM DImuthu Upeksha <
>> > > dimuthu.upeks...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Junkai,
>> > > >
>> > > > This method is being called inside a running task. And it is working
>> > for
>> > > > most of the time. I only saw this in 2 occasions for last few months
>> > and
>> > > > both of them happened today and yesterday.
>> > > >
>> > > > Thanks
>> > > > Dimuthu
>> > > >
>> > > > On Fri, Nov 9, 2018 at 2:40 PM Xue Junkai <junkai....@gmail.com>
>> > wrote:
>> > > >
>> > > > > User content store node will be created one the job has been
>> > scheduled.
>> > > > In
>> > > > > your case, I think the job is not scheduled. This method usually
>> has
>> > > been
>> > > > > utilized in running task.
>> > > > >
>> > > > > Best,
>> > > > >
>> > > > > Junkai
>> > > > >
>> > > > > On Fri, Nov 9, 2018 at 8:19 AM DImuthu Upeksha <
>> > > > dimuthu.upeks...@gmail.com
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Helix Folks,
>> > > > > >
>> > > > > > I'm having this sporadic issue in some tasks of our workflows
>> when
>> > we
>> > > > try
>> > > > > > to store a value in the workflow context and I have added both
>> code
>> > > > > section
>> > > > > > and error message below. Do you have an idea what's causing
>> this?
>> > > > Please
>> > > > > > let me know if you need further information. We are using Helix
>> > 0.8.2
>> > > > > >
>> > > > > > public void sendNextJob(String jobId) {
>> > > > > >     putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW);
>> > > > > >     if (jobId != null) {
>> > > > > >         putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW);
>> > > > > >     }
>> > > > > > }
>> > > > > >
>> > > > > > Failed to setup environment of task
>> > > > > > TASK_55096de4-2cb6-4b09-84fd-7fdddba93435
>> > > > > > java.lang.NullPointerException: null
>> > > > > >         at
>> > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:358)
>> > > > > >         at
>> > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:356)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.helix.manager.zk.HelixGroupCommit.commit(HelixGroupCommit.java:126)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.update(ZkCacheBaseDataAccessor.java:306)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.helix.store.zk.AutoFallbackPropertyStore.update(AutoFallbackPropertyStore.java:61)
>> > > > > >         at
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.helix.task.TaskUtil.addWorkflowJobUserContent(TaskUtil.java:356)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.helix.task.UserContentStore.putUserContent(UserContentStore.java:78)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.airavata.helix.core.AbstractTask.sendNextJob(AbstractTask.java:136)
>> > > > > >         at
>> > > > org.apache.airavata.helix.core.OutPort.invoke(OutPort.java:42)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.airavata.helix.core.AbstractTask.onSuccess(AbstractTask.java:123)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.airavata.helix.impl.task.AiravataTask.onSuccess(AiravataTask.java:97)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.airavata.helix.impl.task.env.EnvSetupTask.onRun(EnvSetupTask.java:52)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:349)
>> > > > > >         at
>> > > > > >
>> > org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:92)
>> > > > > >         at
>> org.apache.helix.task.TaskRunner.run(TaskRunner.java:71)
>> > > > > >         at
>> > > > > >
>> > >
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> > > > > >         at
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> > > > > >         at java.lang.Thread.run(Thread.java:748)
>> > > > > >
>> > > > > > Thanks
>> > > > > > Dimuthu
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Junkai Xue
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Junkai Xue
>> > >
>> >
>>
>>
>> --
>> Junkai Xue
>>
>

Reply via email to