I am using your words *current*. What's the definition of "current" in livy? I think that's all application which still keep some records in the livy's process's memory space
So: 1. How you express this "current" in a query to YARN? I think you have to use ApplicationID (maybe there are some other ways) in a query 2. The problem is that I didn't see such an API to make such a "big call" by passing in all applications's IDs Best, Nan On Wed, Aug 16, 2017 at 12:19 PM, Marcelo Vanzin <[email protected]> wrote: > On Wed, Aug 16, 2017 at 12:02 PM, Nan Zhu <[email protected]> wrote: > > Then which API you would use for *current* Apps? I think you have to > define > > *current* with applicationIds? If that's true, you have to call > > https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/ > apache/hadoop/yarn/client/api/YarnClient.html#line.181 > > What do you mean by "current"? > > Both the API you linked to and the API I linked to give you "current" > apps. The one you linked to gives you all "current" apps regardless of > state. The one I linked to allows you to define which states you're > interested in. So if you're interested in transitions from RUNNING to > FAILED, for example, you need to monitor all apps with both states > RUNNING and FAILED, which that API allows you to do. > > There's no need to make N requests as you mentioned. > > The question is whether it's cheaper to make a single large request to > YARN or N small requests. If you are monitoring 4 or 5 applications it > probably doesn't matter, but if you're monitoring 1000 applications > that are starting up concurrently, I have a feeling that getting all > of that information in a single call will be easier on YARN. > > > > If I didn't miss anything, there is no API to pass in a list of app ids, > as > > a result, you have to fire N requests (N is the number of current apps) > to > > YARN > > > > Then the solution becomes using a single thread to fire N request instead > > of using M threads to fire N requests (ideally M << N) > > > > > > > > > > On Wed, Aug 16, 2017 at 11:41 AM, Marcelo Vanzin <[email protected]> > > wrote: > > > >> On Wed, Aug 16, 2017 at 11:34 AM, Nan Zhu <[email protected]> > wrote: > >> > Yes, I know there is such an API, what I don't understand is what I > >> should > >> > pass in the filtering API you mentioned, say we query YARN for every 5 > >> > tickets > >> > > >> > 0: Query and get App A is running > >> > > >> > 4: App A is done > >> > > >> > 5: Query...so what I should fill as filtering parameters at 5 get > capture > >> > the changes of App A's state? > >> > >> You don't query for app state *changes*. You query for the current app > >> state, and compare against what you have, and then you can detect > >> changes that way. The trick is how to filter to get the information > >> you want, so you limit how much data you request from YARN. > >> > >> I'm not aware of any YARN API to query for state changes like that. So > >> even in the individual request case, you'd have to get app A's state, > >> and update the Livy handle if the state has changed from what was > >> previously know. > >> > >> That's most probably why Meisam's PR only filters by app type. If > >> there are further filters than can be applied, then great, but you > >> still need logic in Livy to detect the state changes you want. > >> > >> > If you look at Meisam's PR, they can only filter based on appType > >> > https://github.com/apache/incubator-livy/pull/36/files#diff- > >> a3f879755cfe10a678cc08ddbe60a4d3R75 > >> > >> > >> -- > >> Marcelo > >> > > > > -- > Marcelo >
