Hi Lee,

Thanks for letting us know. We would happy to try out the latest version.
Can you please point me to such known issues (JIRA or Github issues) in
latest version and then we can decide whether those issue might affect to
our use case or not.

Thanks
Dimuthu

On Fri, Mar 22, 2019 at 4:29 PM Hunter Lee <naren...@gmail.com> wrote:

> Let me add a caveat to my previous email. Although it comes with
> scalability improvements, there are currently a few known issues with the
> latest version. We'd encourage you to check back to make sure your current
> usage isn't affected.
>
> Hunter
>
> On Fri, Mar 22, 2019 at 12:35 PM Hunter Lee <naren...@gmail.com> wrote:
>
> > No problem. If you have further questions, let us know what kind of load
> > you're putting on Helix as well. The newest version of Helix contains
> Task
> > Framework 2.0, and has greater scalability in scheduling tasks, so you
> > might want to consider using the newest version as well.
> >
> > Hunter
> >
> > On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com> wrote:
> >
> >> Hi Lee,
> >>
> >> Thanks for the trick. I didn't know that we can poke the controller like
> >> that :) However now we can see that tasks are moving smoothly in our
> >> staging setup. This behavior can be seen from time to time and get
> >> resolved
> >> automatically in few hours. I can't find a particular pattern however my
> >> best guess is that this happens when the load is high. I will put some
> >> load
> >> on testing setup and see if I can reproduce this issue and try your
> >> instructions then get back to you
> >>
> >> Thanks
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee <naren...@gmail.com> wrote:
> >>
> >> > Hi Dimuthu,
> >> >
> >> > What Junkai meant by touching the IdealState is this:
> >> >
> >> > 1) use Zooinspector to log into ZK
> >> > 2) Locate the IDEALSTATES/ path
> >> > 3) grab any ZNode under that path and try to modify (just add a
> >> > whitespace) and save
> >> > 4) This will trigger a ZK callback which should tell Helix Controller
> to
> >> > rebalance/schedule things
> >> >
> >> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >
> >> >> Hi Junkai,
> >> >>
> >> >> What do you mean by touching ideal state to trigger an event? I
> didn't
> >> >> quite get what you said. Is that like creating some path in
> zookeeper?
> >> >> Workflows are eventually scheduled but the problem is, it is very
> slow
> >> due
> >> >> to that 30s freeze.
> >> >>
> >> >> Thanks
> >> >> Dimuthu
> >> >>
> >> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai <junkai....@gmail.com>
> >> wrote:
> >> >>
> >> >> > Can you try one thing? Touch the ideal state to trigger an event.
> If
> >> >> > workflows are not scheduled, it should scheduling has problem.
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Junkai
> >> >> >
> >> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >> >
> >> >> >> Hi Junkai,
> >> >> >>
> >> >> >> We are using 0.8.1
> >> >> >>
> >> >> >> Dimuthu
> >> >> >>
> >> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai <junkai....@gmail.com
> >
> >> >> wrote:
> >> >> >>
> >> >> >> > Hi Dimuthu,
> >> >> >> >
> >> >> >> > What's the version of Helix you are using?
> >> >> >> >
> >> >> >> > Best,
> >> >> >> >
> >> >> >> > Junkai
> >> >> >> >
> >> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> >> >> > dimuthu.upeks...@gmail.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> > > Hi Helix Dev,
> >> >> >> > >
> >> >> >> > > We are again seeing this delay in task execution. Please have
> a
> >> >> look
> >> >> >> at
> >> >> >> > the
> >> >> >> > > screencast [1] of logs printed in participant (top shell) and
> >> >> >> controller
> >> >> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> >> >> workflows
> >> >> >> > > pending to be executed. As you can see some tasks were
> suddenly
> >> >> >> executed
> >> >> >> > > and then participant freezed for about 30 seconds before
> >> executing
> >> >> >> next
> >> >> >> > set
> >> >> >> > > of tasks. I can see some WARN logs on controller log. I feel
> >> like
> >> >> >> this 30
> >> >> >> > > second delay is some sort of a pattern. What do you think as
> the
> >> >> >> reason
> >> >> >> > for
> >> >> >> > > this? I can provide you more information by turning on verbose
> >> >> logs on
> >> >> >> > > controller if you want.
> >> >> >> > >
> >> >> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> >> >> > >
> >> >> >> > > Thanks
> >> >> >> > > Dimuthu
> >> >> >> > >
> >> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> >> >> > dimuthu.upeks...@gmail.com
> >> >> >> > > >
> >> >> >> > > wrote:
> >> >> >> > >
> >> >> >> > > > Hi Junkai,
> >> >> >> > > >
> >> >> >> > > > I'm CCing Airavata dev list as this is directly related to
> the
> >> >> >> project.
> >> >> >> > > >
> >> >> >> > > > I just went through the zookeeper path like /<Cluster
> >> >> >> > Name>/EXTERNALVIEW,
> >> >> >> > > > /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that
> helix
> >> >> >> > controller
> >> >> >> > > is
> >> >> >> > > > periodically monitoring for the children of those paths even
> >> >> though
> >> >> >> all
> >> >> >> > > the
> >> >> >> > > > Workflows have moved into a saturated state like COMPLETED
> and
> >> >> >> STOPPED.
> >> >> >> > > In
> >> >> >> > > > our case, we have a lot of completed workflows piled up in
> >> those
> >> >> >> > paths. I
> >> >> >> > > > believe that helix is clearing up those resources after some
> >> TTL.
> >> >> >> What
> >> >> >> > I
> >> >> >> > > > did was writing an external spectator [1] that continuously
> >> >> monitors
> >> >> >> > for
> >> >> >> > > > saturated workflows and clearing up resources before
> >> controller
> >> >> does
> >> >> >> > that
> >> >> >> > > > after a TTL. After that, we didn't see such delays in
> workflow
> >> >> >> > execution
> >> >> >> > > > and everything seems to be running smoothly. However we are
> >> >> >> > continuously
> >> >> >> > > > monitoring our deployments for any form of adverse effect
> >> >> >> introduced by
> >> >> >> > > > that improvement.
> >> >> >> > > >
> >> >> >> > > > Please let us know if we are doing something wrong in this
> >> >> >> improvement
> >> >> >> > or
> >> >> >> > > > is there any better way to achieve this directly through
> helix
> >> >> task
> >> >> >> > > > framework.
> >> >> >> > > >
> >> >> >> > > > [1]
> >> >> >> > > >
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> >> >> >> > > >
> >> >> >> > > > Thanks
> >> >> >> > > > Dimuthu
> >> >> >> > > >
> >> >> >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <
> >> junkai....@gmail.com>
> >> >> >> > wrote:
> >> >> >> > > >
> >> >> >> > > >> Could you please check the log of how long for each
> pipeline
> >> >> stage
> >> >> >> > > takes?
> >> >> >> > > >>
> >> >> >> > > >> Also, did you set expiry for workflows? Are they piled up
> for
> >> >> long
> >> >> >> > time?
> >> >> >> > > >> How long for each workflow completes?
> >> >> >> > > >>
> >> >> >> > > >> best,
> >> >> >> > > >>
> >> >> >> > > >> Junkai
> >> >> >> > > >>
> >> >> >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> >> >> >> > > >> dimuthu.upeks...@gmail.com>
> >> >> >> > > >> wrote:
> >> >> >> > > >>
> >> >> >> > > >> > Hi Junkai,
> >> >> >> > > >> >
> >> >> >> > > >> > Average load is like 10 - 20 workflows per minutes. In
> some
> >> >> cases
> >> >> >> > it's
> >> >> >> > > >> less
> >> >> >> > > >> > than that However based on the observations, I feel like
> it
> >> >> does
> >> >> >> not
> >> >> >> > > >> depend
> >> >> >> > > >> > on the load and it is sporadic. Is there a particular log
> >> >> lines
> >> >> >> > that I
> >> >> >> > > >> can
> >> >> >> > > >> > filter in controller and participant to capture the
> >> timeline
> >> >> of
> >> >> >> > > >> workflow so
> >> >> >> > > >> > that I can figure out which which component is
> >> >> malfunctioning? We
> >> >> >> > use
> >> >> >> > > >> helix
> >> >> >> > > >> > v 0.8.1.
> >> >> >> > > >> >
> >> >> >> > > >> > Thanks
> >> >> >> > > >> > Dimuthu
> >> >> >> > > >> >
> >> >> >> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai <
> >> >> junkai....@gmail.com
> >> >> >> >
> >> >> >> > > >> wrote:
> >> >> >> > > >> >
> >> >> >> > > >> > > Hi Dimuthu,
> >> >> >> > > >> > >
> >> >> >> > > >> > > At which rate, you are keep submitting workflows?
> >> Usually,
> >> >> >> > Workflow
> >> >> >> > > >> > > scheduling is very fast. And which version of Helix you
> >> are
> >> >> >> using?
> >> >> >> > > >> > >
> >> >> >> > > >> > > Best,
> >> >> >> > > >> > >
> >> >> >> > > >> > > Junkai
> >> >> >> > > >> > >
> >> >> >> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> >> >> >> > > >> > > dimuthu.upeks...@gmail.com>
> >> >> >> > > >> > > wrote:
> >> >> >> > > >> > >
> >> >> >> > > >> > > > Hi Folks,
> >> >> >> > > >> > > >
> >> >> >> > > >> > > > We have noticed some delays between workflow
> submission
> >> >> and
> >> >> >> > actual
> >> >> >> > > >> > > picking
> >> >> >> > > >> > > > up by participants and seems like that delay is
> >> somewhat
> >> >> >> > constant
> >> >> >> > > >> > around
> >> >> >> > > >> > > 2-
> >> >> >> > > >> > > > 3 minutes. We used to continuously submit workflows
> and
> >> >> >> after 2
> >> >> >> > -3
> >> >> >> > > >> > > minutes,
> >> >> >> > > >> > > > a bulk of workflows are picked by participant and
> >> execute
> >> >> >> them.
> >> >> >> > > >> Then it
> >> >> >> > > >> > > > remain silent for next 2 -3 minutes event we submit
> >> more
> >> >> >> > > workflows.
> >> >> >> > > >> > It's
> >> >> >> > > >> > > > like participant picking up workflows in discrete
> time
> >> >> >> > intervals.
> >> >> >> > > >> I'm
> >> >> >> > > >> > not
> >> >> >> > > >> > > > sure whether this is an issue of controller or the
> >> >> >> participant.
> >> >> >> > Do
> >> >> >> > > >> you
> >> >> >> > > >> > > have
> >> >> >> > > >> > > > any experience with this sort of behavior?
> >> >> >> > > >> > > >
> >> >> >> > > >> > > > Thanks
> >> >> >> > > >> > > > Dimuthu
> >> >> >> > > >> > > >
> >> >> >> > > >> > >
> >> >> >> > > >> > >
> >> >> >> > > >> > > --
> >> >> >> > > >> > > Junkai Xue
> >> >> >> > > >> > >
> >> >> >> > > >> >
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >> --
> >> >> >> > > >> Junkai Xue
> >> >> >> > > >>
> >> >> >> > > >
> >> >> >> > >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Junkai Xue
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Junkai Xue
> >> >> >
> >> >>
> >> >
> >>
> >
>

Reply via email to