Can you try one thing? Touch the ideal state to trigger an event. If
workflows are not scheduled, it should scheduling has problem.

Best,

Junkai

On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com>
wrote:

> Hi Junkai,
>
> We are using 0.8.1
>
> Dimuthu
>
> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai <junkai....@gmail.com> wrote:
>
> > Hi Dimuthu,
> >
> > What's the version of Helix you are using?
> >
> > Best,
> >
> > Junkai
> >
> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com>
> > wrote:
> >
> > > Hi Helix Dev,
> > >
> > > We are again seeing this delay in task execution. Please have a look at
> > the
> > > screencast [1] of logs printed in participant (top shell) and
> controller
> > > (bottom shell). When I record this, there were about 90 - 100 workflows
> > > pending to be executed. As you can see some tasks were suddenly
> executed
> > > and then participant freezed for about 30 seconds before executing next
> > set
> > > of tasks. I can see some WARN logs on controller log. I feel like this
> 30
> > > second delay is some sort of a pattern. What do you think as the reason
> > for
> > > this? I can provide you more information by turning on verbose logs on
> > > controller if you want.
> > >
> > > [1] https://youtu.be/3EUdSxnIxVw
> > >
> > > Thanks
> > > Dimuthu
> > >
> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Junkai,
> > > >
> > > > I'm CCing Airavata dev list as this is directly related to the
> project.
> > > >
> > > > I just went through the zookeeper path like /<Cluster
> > Name>/EXTERNALVIEW,
> > > > /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix
> > controller
> > > is
> > > > periodically monitoring for the children of those paths even though
> all
> > > the
> > > > Workflows have moved into a saturated state like COMPLETED and
> STOPPED.
> > > In
> > > > our case, we have a lot of completed workflows piled up in those
> > paths. I
> > > > believe that helix is clearing up those resources after some TTL.
> What
> > I
> > > > did was writing an external spectator [1] that continuously monitors
> > for
> > > > saturated workflows and clearing up resources before controller does
> > that
> > > > after a TTL. After that, we didn't see such delays in workflow
> > execution
> > > > and everything seems to be running smoothly. However we are
> > continuously
> > > > monitoring our deployments for any form of adverse effect introduced
> by
> > > > that improvement.
> > > >
> > > > Please let us know if we are doing something wrong in this
> improvement
> > or
> > > > is there any better way to achieve this directly through helix task
> > > > framework.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> > > >
> > > > Thanks
> > > > Dimuthu
> > > >
> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <junkai....@gmail.com>
> > wrote:
> > > >
> > > >> Could you please check the log of how long for each pipeline stage
> > > takes?
> > > >>
> > > >> Also, did you set expiry for workflows? Are they piled up for long
> > time?
> > > >> How long for each workflow completes?
> > > >>
> > > >> best,
> > > >>
> > > >> Junkai
> > > >>
> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> > > >> dimuthu.upeks...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Junkai,
> > > >> >
> > > >> > Average load is like 10 - 20 workflows per minutes. In some cases
> > it's
> > > >> less
> > > >> > than that However based on the observations, I feel like it does
> not
> > > >> depend
> > > >> > on the load and it is sporadic. Is there a particular log lines
> > that I
> > > >> can
> > > >> > filter in controller and participant to capture the timeline of
> > > >> workflow so
> > > >> > that I can figure out which which component is malfunctioning? We
> > use
> > > >> helix
> > > >> > v 0.8.1.
> > > >> >
> > > >> > Thanks
> > > >> > Dimuthu
> > > >> >
> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai <junkai....@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hi Dimuthu,
> > > >> > >
> > > >> > > At which rate, you are keep submitting workflows? Usually,
> > Workflow
> > > >> > > scheduling is very fast. And which version of Helix you are
> using?
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > Junkai
> > > >> > >
> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > > >> > > dimuthu.upeks...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Folks,
> > > >> > > >
> > > >> > > > We have noticed some delays between workflow submission and
> > actual
> > > >> > > picking
> > > >> > > > up by participants and seems like that delay is somewhat
> > constant
> > > >> > around
> > > >> > > 2-
> > > >> > > > 3 minutes. We used to continuously submit workflows and after
> 2
> > -3
> > > >> > > minutes,
> > > >> > > > a bulk of workflows are picked by participant and execute
> them.
> > > >> Then it
> > > >> > > > remain silent for next 2 -3 minutes event we submit more
> > > workflows.
> > > >> > It's
> > > >> > > > like participant picking up workflows in discrete time
> > intervals.
> > > >> I'm
> > > >> > not
> > > >> > > > sure whether this is an issue of controller or the
> participant.
> > Do
> > > >> you
> > > >> > > have
> > > >> > > > any experience with this sort of behavior?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > > Dimuthu
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Junkai Xue
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Junkai Xue
> > > >>
> > > >
> > >
> >
> >
> > --
> > Junkai Xue
> >
>


-- 
Junkai Xue

Reply via email to