The doc peaks my interest, but I'd have to see some code before deciding if
its the best direction or not. Hopefully we'll have the new JIRA up by end
of week so you can submit a JIRA but feel free to open a WIP PR earlier if
you're ready.
Alex Bozarth
Software Engineer
Spark Technology Center
E-mail: [email protected]
GitHub: github.com/ajbozarth
505 Howard
Street
San Francisco, CA
94105
United
States
From: Nan Zhu <[email protected]>
To: [email protected]
Date: 08/14/2017 02:35 PM
Subject: resolve the scalability problem caused by app monitoring in
livy with an actor-based design
Hi, all
In HDInsight, we (Microsoft) use Livy as the Spark job submission service.
We keep seeing the customers fall into the problem when they submit many
concurrent applications to the system, or recover livy from a state with
many concurrent applications
By looking at the code and the customers' exception stack, we lock down the
problem to the application monitoring module where a new thread is created
for each application.
To resolve the issue, we propose a actor-based design of application
monitoring module and share it here (as new JIRA seems not working
yet)
*https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P-nbTQTdDFXl2XQhXDiwA/edit?usp=sharing
<
https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P-nbTQTdDFXl2XQhXDiwA/edit?usp=sharing
>*
We are glad to hear feedbacks from the community and improve the design
before we start implementing it!
Best,
Nan