On Wed, Aug 16, 2017 at 9:06 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
>> I'm not really sure what you're talking about here, since I did not
> suggest a "shared data structure", and I'm not really sure what that
> means in this context.
>
> What you claimed is just monitoring/updating the state with a single thread
> *given* all applications have been there.

What I proposed is having a single request to YARN to get all
applications' statuses, if that's possible. You'd still have multiple
application handles that are independent of each other. They'd all be
updated separately from that one thread talking to YARN.

This has nothing to do with a "shared data structure". There's no
shared data structure here to track application status.

>> Yes. While there are applications that need monitoring, you poll YARN
> at a constant frequency. Basically what would be done by multiple
> threads, but there's a single one.
>
> Did you find the bulk API?

No, but I suggested that you look whether that exists since I think
that's a better solution both from YARN and Livy's perspectives, since
it requires less resources. It should at least be mentioned as an
alternative in your mini-spec and, if it doesn't work for whatever
reason, deserves an explanation.

>> Why not. The expensive part is not parsing results, I'll bet, but
> having a whole bunch of different tasks opening and closing YARN
> connections.
>
> First, YARNClient is thread safe and can be shared by multiple threads....

Irrelevant.

> Second, If I have 1000 applications, what's your expectation to the
> following cases
>
> 1. YARN processed request for 999 and failed on the last one for some reason
>
> 2. Livy received 999 well-formatted response but get 1 malformed response

What if YARN goes down? What if your datacenter has a massive power failure?

You have to handle errors in any scenario.


-- 
Marcelo

Reply via email to