Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Nan Zhu Wed, 16 Aug 2017 12:27:26 -0700

I am using your words *current*. What's the definition of "current" in
livy? I think that's all application which still keep some records in the
livy's process's memory space


So:

1. How you express this "current" in a query to YARN? I think you have to
use ApplicationID (maybe there are some other ways) in a query

2. The problem is that I didn't see such an API to make such a "big call"
by passing in all applications's IDs


Best,

Nan



On Wed, Aug 16, 2017 at 12:19 PM, Marcelo Vanzin <[email protected]>
wrote:

> On Wed, Aug 16, 2017 at 12:02 PM, Nan Zhu <[email protected]> wrote:
> > Then which API you would use for *current* Apps? I think you have to
> define
> > *current* with applicationIds? If that's true, you have to call
> > https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/
> apache/hadoop/yarn/client/api/YarnClient.html#line.181
>
> What do you mean by "current"?
>
> Both the API you linked to and the API I linked to give you "current"
> apps. The one you linked to gives you all "current" apps regardless of
> state. The one I linked to allows you to define which states you're
> interested in. So if you're interested in transitions from RUNNING to
> FAILED, for example, you need to monitor all apps with both states
> RUNNING and FAILED, which that API allows you to do.
>
> There's no need to make N requests as you mentioned.
>
> The question is whether it's cheaper to make a single large request to
> YARN or N small requests. If you are monitoring 4 or 5 applications it
> probably doesn't matter, but if you're monitoring 1000 applications
> that are starting up concurrently, I have a feeling that getting all
> of that information in a single call will be easier on YARN.
>
>
> > If I didn't miss anything, there is no API to pass in a list of app ids,
> as
> > a result, you have to fire N requests (N is the number of current apps)
> to
> > YARN
> >
> > Then the solution becomes using a single thread to fire N request instead
> > of using M threads to fire N requests (ideally M << N)
> >
> >
> >
> >
> > On Wed, Aug 16, 2017 at 11:41 AM, Marcelo Vanzin <[email protected]>
> > wrote:
> >
> >> On Wed, Aug 16, 2017 at 11:34 AM, Nan Zhu <[email protected]>
> wrote:
> >> > Yes, I know there is such an API, what I don't understand is what I
> >> should
> >> > pass in the filtering API you mentioned, say we query YARN for every 5
> >> > tickets
> >> >
> >> > 0: Query and get App A is running
> >> >
> >> > 4: App A is done
> >> >
> >> > 5: Query...so what I should fill as filtering parameters at 5 get
> capture
> >> > the changes of App A's state?
> >>
> >> You don't query for app state *changes*. You query for the current app
> >> state, and compare against what you have, and then you can detect
> >> changes that way. The trick is how to filter to get the information
> >> you want, so you limit how much data you request from YARN.
> >>
> >> I'm not aware of any YARN API to query for state changes like that. So
> >> even in the individual request case, you'd have to get app A's state,
> >> and update the Livy handle if the state has changed from what was
> >> previously know.
> >>
> >> That's most probably why Meisam's PR only filters by app type. If
> >> there are further filters than can be applied, then great, but you
> >> still need logic in Livy to detect the state changes you want.
> >>
> >> > If you look at Meisam's PR, they can only filter based on appType
> >> > https://github.com/apache/incubator-livy/pull/36/files#diff-
> >> a3f879755cfe10a678cc08ddbe60a4d3R75
> >>
> >>
> >> --
> >> Marcelo
> >>
>
>
>
> --
> Marcelo
>

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Reply via email to