+1
On Fri, Sep 5, 2014 at 2:53 PM, Ufuk Celebi <[email protected]> wrote: > +1 > > > On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <[email protected]> > wrote: > > > +1 for refactoring using Akka, the arguments are overwhelming. > > > > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[email protected]> > > wrote: > > > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have > a > > > big community [1] and users [2] that depend on the system. > > > > > > The YARN client is also using the old RPC service. I would like to > > rewrite > > > it with Akka once we have added it into the other parts of the system, > to > > > learn it. > > > > > > > > > [1] https://github.com/akka/akka/pulls > > > [2] > > > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[email protected]> wrote: > > > > > > > This proposes to refactor the RPC service and the coordination > between > > > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > > > > > Even though Akka is written in Scala, it offers a Java interface and > we > > > can > > > > use Akka completely from Java. > > > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > > > > Problems with the current RPC service: > > > > -------------------------------------------------------- > > > > > > > > - No asynchronous calls with callbacks. This is the reason why > > several > > > > parts of the runtime poll the status, introducing unnecessary > latency. > > > > > > > > - No exception forwarding (many exceptions are simply swallowed), > > > making > > > > debugging and operation in flaky environments very hard > > > > > > > > - Limited number of handler threads. The RPC can only handle a fix > > > number > > > > of concurrent requests, forcing you to maintain separate thread pools > > to > > > > delegate actions to > > > > > > > > - No support for primitive data types (or boxed primitives) as > > > arguments, > > > > everything has to be a specially serializable type > > > > > > > > - Problematic threading model. The RPC continuously spawns and > > > terminates > > > > threads > > > > > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > > > - Akka solves all of the above issues out of the box > > > > > > > > - The supervisor model allows you to do failure detection of > actors. > > > That > > > > provides a unified way of detecting and handling failures (missing > > > > heartbeats, failed calls, ...) > > > > > > > > - Akka has tools to make stateful actors persistent and restart > them > > on > > > > other machines in cases of failure. That would greatly help in > > > implementing > > > > "master fail-over", which will become important > > > > > > > > - You can define many "call targets" (actors). Tasks (on > > taskmanagers) > > > > can directly call their ExecutionVertex on the JobManager, rather > than > > > > calling the JobManager, creating a Runnable that looks up the > execution > > > > vertex, and so on... > > > > > > > > - The actor model's approach to queue actions on an actor and run > the > > > one > > > > after another makes the concurrency model of the state machine very > > > simple > > > > and robust > > > > > > > > - We "outsource" our own concerns about maintaining and improving > > that > > > > part of the system > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > >
