> What I proposed is having a single request to YARN to get all applications' statuses, if that's possible. You'd still have multiple application handles that are independent of each other. They'd all be updated separately from that one thread talking to YARN. This has nothing to do with a "shared data structure". There's no shared data structure here to track application status.
You are still avoiding the questions how you make all "application handles" accessible to this thread Please go with direct discussion > No, but I suggested that you look whether that exists since I think that's a better solution both from YARN and Livy's perspectives, since it requires less resources. It should at least be mentioned as an alternative in your mini-spec and, if it doesn't work for whatever reason, deserves an explanation. "I would investigate whether there's any API in YARN to do a bulk get of running applications with a particular filter;" - from your email If you suggest something, please find evidence to support you > Irrelevant. Please go with direct discussion > What if YARN goes down? What if your datacenter has a massive power failure? You have to handle errors in any scenario. Again, I am describing one concrete scenario which is always involved in any bulk operation and even we go to bulk direction, you have to handle this. Since you proposed this bulk operation, I am asking you what's your expectation about this. But you are throwing some imaginations without any values Please go with direct discussion On Wed, Aug 16, 2017 at 9:11 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Wed, Aug 16, 2017 at 9:06 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > >> I'm not really sure what you're talking about here, since I did not > > suggest a "shared data structure", and I'm not really sure what that > > means in this context. > > > > What you claimed is just monitoring/updating the state with a single > thread > > *given* all applications have been there. > > What I proposed is having a single request to YARN to get all > applications' statuses, if that's possible. You'd still have multiple > application handles that are independent of each other. They'd all be > updated separately from that one thread talking to YARN. > > This has nothing to do with a "shared data structure". There's no > shared data structure here to track application status. > > >> Yes. While there are applications that need monitoring, you poll YARN > > at a constant frequency. Basically what would be done by multiple > > threads, but there's a single one. > > > > Did you find the bulk API? > > No, but I suggested that you look whether that exists since I think > that's a better solution both from YARN and Livy's perspectives, since > it requires less resources. It should at least be mentioned as an > alternative in your mini-spec and, if it doesn't work for whatever > reason, deserves an explanation. > > >> Why not. The expensive part is not parsing results, I'll bet, but > > having a whole bunch of different tasks opening and closing YARN > > connections. > > > > First, YARNClient is thread safe and can be shared by multiple > threads.... > > Irrelevant. > > > Second, If I have 1000 applications, what's your expectation to the > > following cases > > > > 1. YARN processed request for 999 and failed on the last one for some > reason > > > > 2. Livy received 999 well-formatted response but get 1 malformed response > > What if YARN goes down? What if your datacenter has a massive power > failure? > > You have to handle errors in any scenario. > > > -- > Marcelo >