On Wed, Aug 16, 2017 at 2:09 PM, Nan Zhu <[email protected]> wrote:
> With time goes, the reply from YARN can only be larger and larger. Given
> the consistent workload pattern, the cost of a large query can be
> eventually larger than individual request

That's where filtering would help, if it's possible to do it easily.

> I would say go with individual request + thread pool  or large batch for
> all first, if any performance issue is observed, add the optimization on
> top of it

How about doing some experiments?

You seem to have spent time with your proposed approach, so I believe
there's at least some kind of prototype you're working on. It should
be easy to get average latency for each request and throughput for
different thread counts.

You can do a crude approximation of what it would take to get the same
data in bulk by hitting the REST API with curl; no need to write code,
and you get both an idea of latency and of the size of the bulk
replies.

-- 
Marcelo

Reply via email to