Hi, Meisam Many thanks for sending the PR
my question related to the undergoing discussion is simply "have you seen any performance issue in https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 ?" We have several scenarios that a large volume of applications are submitted to YARN every day and it easily accumulates a lot to be fetched with this call Best, Nan On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi <meisam.fa...@gmail.com> wrote: > Here are my two pennies on both designs (actor-based design vs. > single-thread polling design) > > *Single-thread polling design* > We implemented a single-thread polling mechanism for Yarn here at PayPal. > Our solution is more involved because we added many new features to Livy > that we had to consider when we refactored Livy's YARN interface. But we > are willing to hammer our changes so it suits the need of the Livy > community best :-) > > *Actor-based design* > It seems to me that the proposed actor based design ( > https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P- > nbTQTdDFXl2XQhXDiwA/edit) > needs a few more messages and actors. Here is why. > Livy makes three (blocking) calls to YARN > 1. `yarnClient.getApplications`, which gives Livy `ApplicatioId`s > 2. `yarnClient.getApplicationAttemptReport(ApplicationId)`, which gives > Livy `getAMContainerId` > 3. `yarnClient.getContainerReport`, which gives Livy tracking URLs > > The result of the previous call is needed to make the next call. The > proposed actor system needs to be designed to handles all these blocking > calls. > > I do agree that actor based design is cleaner and more maintainable. But we > had to discard it because it adds more dependencies to Livy. We faced too > many dependency-version-mismatch problems with Livy interactive sessions > (when applications depend on a different version of a library that is used > internally by Livy). If the livy community prefers an actor based design, > we are willing to reimplement our changes with an actor system. > > Finally, either design is only the first step in fixing this particular > scalability problem. The reason is that the *"yarnAppMinotorThread" is not > the only thread that Livy spawns per Spark application.* For batch jobs, > Livy > 1. calls spark-submit, which lunches a new JVM (an operations that is far > more heavy than creating a thread and can easily drain the system) > 2. It create a thread that waits for the exist code of spark-submit. Even > though this thread is "short-lived", at peak time thousands of such threads > are created in a few seconds. > > I created a PR with our modifications to it. > > https://github.com/apache/incubator-livy/pull/36 > > Thanks, > Meisam > > On Wed, Aug 16, 2017 at 10:42 AM Marcelo Vanzin <van...@cloudera.com> > wrote: > > > Hello, > > > > On Wed, Aug 16, 2017 at 10:35 AM, Arijit Tarafdar > > <arij...@microsoft.com.invalid> wrote: > > > 1. Additional copy of states in Livy which can be queried from YARN on > > request. > > > > Not sure I follow. > > > > > 2. The design is not event driven and may waste querying YARN > > unnecessarily when no actual user/external request is pending. > > > > You don't need to keep querying YARN if there are no apps to monitor. > > > > > 3. There will always be an issue with stale data and update latency > > between actual YARN state and Livy state map. > > > > That is also the case with a thread pool that has less threads than > > the number of apps being monitored, if making one request per app. > > > > > 4. Size and latency of the response in bulk querying YARN is unknown. > > > > That is also unknown when making multiple requests from multiple > > threads, unless you investigate the internal implementation of both > > YARN clients and servers. > > > > > 5. YARN bulk API needs to support filtering at the query level. > > > > Yes, I mentioned that in my original response and really was just > > expecting Nan to do a quick investigation of that implementation > > option. > > > > He finally did and it seems that the API only exists through the REST > > interface, so this all is moot. > > > > -- > > Marcelo > > >