+1 for refactoring using Akka, the arguments are overwhelming.
On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[email protected]> wrote: > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a > big community [1] and users [2] that depend on the system. > > The YARN client is also using the old RPC service. I would like to rewrite > it with Akka once we have added it into the other parts of the system, to > learn it. > > > [1] https://github.com/akka/akka/pulls > [2] > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[email protected]> wrote: > > > This proposes to refactor the RPC service and the coordination between > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > Even though Akka is written in Scala, it offers a Java interface and we > can > > use Akka completely from Java. > > > > Below are a list of arguments why this would help the system: > > > > > > Problems with the current RPC service: > > -------------------------------------------------------- > > > > - No asynchronous calls with callbacks. This is the reason why several > > parts of the runtime poll the status, introducing unnecessary latency. > > > > - No exception forwarding (many exceptions are simply swallowed), > making > > debugging and operation in flaky environments very hard > > > > - Limited number of handler threads. The RPC can only handle a fix > number > > of concurrent requests, forcing you to maintain separate thread pools to > > delegate actions to > > > > - No support for primitive data types (or boxed primitives) as > arguments, > > everything has to be a specially serializable type > > > > - Problematic threading model. The RPC continuously spawns and > terminates > > threads > > > > > > > > Benefits of switching to the Akka actor model: > > > > > ------------------------------------------------------------------------------- > > > > - Akka solves all of the above issues out of the box > > > > - The supervisor model allows you to do failure detection of actors. > That > > provides a unified way of detecting and handling failures (missing > > heartbeats, failed calls, ...) > > > > - Akka has tools to make stateful actors persistent and restart them on > > other machines in cases of failure. That would greatly help in > implementing > > "master fail-over", which will become important > > > > - You can define many "call targets" (actors). Tasks (on taskmanagers) > > can directly call their ExecutionVertex on the JobManager, rather than > > calling the JobManager, creating a Runnable that looks up the execution > > vertex, and so on... > > > > - The actor model's approach to queue actions on an actor and run the > one > > after another makes the concurrency model of the state machine very > simple > > and robust > > > > - We "outsource" our own concerns about maintaining and improving that > > part of the system > > > > Greetings, > > Stephan > > >
