With time goes, the reply from YARN can only be larger and larger. Given the consistent workload pattern, the cost of a large query can be eventually larger than individual request
I would say go with individual request + thread pool or large batch for all first, if any performance issue is observed, add the optimization on top of it Regarding how to optimize, The major issue is that YarnClient API is a simplified version of Rest APIs regarding the less number of filtering parameters. I looked at the usage of YarnClient in the current implementation (only Livy-Server), only SparkYarnApp class is using that. Since there will be a big refactoring of this class, replacing YarnClient with a home-made Restful Client might not be that costly *multiple Individual request:* Batching individual requests based on submission time *a single Large request:* Limiting number of fetched app status can be achieved with, e.g. application submission time, or limit.....which are only available with rest APIs. However, even with rest API, there are some corner cases, e.g. a long running app lasting for days (training some models), and some short ones which last only for minutes Best, Nan On Wed, Aug 16, 2017 at 1:01 PM, Marcelo Vanzin <[email protected]> wrote: > On Wed, Aug 16, 2017 at 12:57 PM, Nan Zhu <[email protected]> wrote: > > yes, we finally converge on the idea > > > > how large the reply can be? if I have only one running applications and I > > still need to fetch 1000 > > > > on the other side > > > > I have 1000 running apps, what's the cost of sending 1000 requests even > the > > thread pool and yarn client are shared? > > I don't know the answers, but I'm asking you, since you are proposing > the design, to consider that as an option, since it does not seem like > you considered that tradeoff when suggesting your current approach. > > My comments about filtering are targeted at making things better in > your first case; if there's really only one app being monitored, and > you can figure out a filter that returns let's say 50 apps instead of > 1000 that may be monitored by YARN, then you can do that. > > Or maybe you can go with a hybrid approach, where you use individual > requests but past a certain threshold you fall back to bulk requests > to avoid overloading YARN. > > Again, I'm asking you to consider alternatives that are not mentioned > in your design document, because I identified potential performance > issues in the current approach. > > > -- > Marcelo >
