On Wed, Aug 16, 2017 at 11:17 AM, Nan Zhu <[email protected]> wrote: > Looks like non-REST API also contains this https://hadoop.apache. > org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/ > api/YarnClient.html#line.225 > > my concern which was skipped in your last email (again) is that, how many > app states we want to fetch through this API. What I can see is we cannot > filter applications since application state can change between two polls, > any thoughts?
I didn't skip it. I'm intentionally keeping the discussion high level because there's no code here to compare. It's purely a "multiple requests for single app state" vs. "single request for multiple applications' statuses" discussion. The bulk API I suggested you to investigate should be able to support enough filtering so that Livy only gets the information it needs (maybe with a little extra noise). It should't get every single YARN application ever run, for example. This method is more what I was thinking of: 287 public abstract List<ApplicationReport> getApplications( 288 Set<String> applicationTypes, 289 EnumSet<YarnApplicationState> applicationStates) throws YarnException, 290 IOException; Lets you query apps with a given type and multiple states that you're interested in. It's not optimal (doesn't let you filter by tags, for example), but it's better than getting all apps. Maybe that's now enough either, but you're proposing the changes, so please explain why that is not enough instead of just throwing the question back at me. -- Marcelo
