Hi, all

In HDInsight, we (Microsoft) use Livy as the Spark job submission service.
We keep seeing the customers fall into the problem when they submit many
concurrent applications to the system, or recover livy from a state with
many concurrent applications

By looking at the code and the customers' exception stack, we lock down the
problem to the application monitoring module where a new thread is created
for each application.

To resolve the issue, we propose a actor-based design of application
monitoring module and share it here (as new JIRA seems not working
yet) 
*https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P-nbTQTdDFXl2XQhXDiwA/edit?usp=sharing
<https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P-nbTQTdDFXl2XQhXDiwA/edit?usp=sharing>*

We are glad to hear feedbacks from the community and improve the design
before we start implementing it!

Best,

Nan

Reply via email to