On Wed, Aug 16, 2017 at 2:09 PM, Nan Zhu <[email protected]> wrote: > With time goes, the reply from YARN can only be larger and larger. Given > the consistent workload pattern, the cost of a large query can be > eventually larger than individual request
That's where filtering would help, if it's possible to do it easily. > I would say go with individual request + thread pool or large batch for > all first, if any performance issue is observed, add the optimization on > top of it How about doing some experiments? You seem to have spent time with your proposed approach, so I believe there's at least some kind of prototype you're working on. It should be easy to get average latency for each request and throughput for different thread counts. You can do a crude approximation of what it would take to get the same data in bulk by hitting the REST API with curl; no need to write code, and you get both an idea of latency and of the size of the bulk replies. -- Marcelo
