Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Marcelo Vanzin Wed, 16 Aug 2017 12:20:28 -0700

On Wed, Aug 16, 2017 at 12:02 PM, Nan Zhu <[email protected]> wrote:
> Then which API you would use for *current* Apps? I think you have to define
> *current* with applicationIds? If that's true, you have to call
> https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/api/YarnClient.html#line.181

What do you mean by "current"?

Both the API you linked to and the API I linked to give you "current"
apps. The one you linked to gives you all "current" apps regardless of
state. The one I linked to allows you to define which states you're
interested in. So if you're interested in transitions from RUNNING to
FAILED, for example, you need to monitor all apps with both states
RUNNING and FAILED, which that API allows you to do.

There's no need to make N requests as you mentioned.

The question is whether it's cheaper to make a single large request to
YARN or N small requests. If you are monitoring 4 or 5 applications it
probably doesn't matter, but if you're monitoring 1000 applications
that are starting up concurrently, I have a feeling that getting all
of that information in a single call will be easier on YARN.

> If I didn't miss anything, there is no API to pass in a list of app ids, as
> a result, you have to fire N requests (N is the number of current apps) to
> YARN
>
> Then the solution becomes using a single thread to fire N request instead
> of using M threads to fire N requests (ideally M << N)
>
>
>
>
> On Wed, Aug 16, 2017 at 11:41 AM, Marcelo Vanzin <[email protected]>
> wrote:
>
>> On Wed, Aug 16, 2017 at 11:34 AM, Nan Zhu <[email protected]> wrote:
>> > Yes, I know there is such an API, what I don't understand is what I
>> should
>> > pass in the filtering API you mentioned, say we query YARN for every 5
>> > tickets
>> >
>> > 0: Query and get App A is running
>> >
>> > 4: App A is done
>> >
>> > 5: Query...so what I should fill as filtering parameters at 5 get capture
>> > the changes of App A's state?
>>
>> You don't query for app state *changes*. You query for the current app
>> state, and compare against what you have, and then you can detect
>> changes that way. The trick is how to filter to get the information
>> you want, so you limit how much data you request from YARN.
>>
>> I'm not aware of any YARN API to query for state changes like that. So
>> even in the individual request case, you'd have to get app A's state,
>> and update the Livy handle if the state has changed from what was
>> previously know.
>>
>> That's most probably why Meisam's PR only filters by app type. If
>> there are further filters than can be applied, then great, but you
>> still need logic in Livy to detect the state changes you want.
>>
>> > If you look at Meisam's PR, they can only filter based on appType
>> > https://github.com/apache/incubator-livy/pull/36/files#diff-
>> a3f879755cfe10a678cc08ddbe60a4d3R75
>>
>>
>> --
>> Marcelo
>>

-- 
Marcelo

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

Reply via email to