Re: Proposal: Refactor distributed coordination to use the Akka Actor Library

Robert Metzger Fri, 05 Sep 2014 05:06:07 -0700

I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a
big community [1] and users [2] that depend on the system.


The YARN client is also using the old RPC service. I would like to rewrite
it with Akka once we have added it into the other parts of the system, to
learn it.


[1] https://github.com/akka/akka/pulls
[2] http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html



On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[email protected]> wrote:

> This proposes to refactor the RPC service and the coordination between
> Client, JobManager, and TaskManager to use the Akka actor library.
>
> Even though Akka is written in Scala, it offers a Java interface and we can
> use Akka completely from Java.
>
> Below are a list of arguments why this would help the system:
>
>
> Problems with the current RPC service:
> --------------------------------------------------------
>
>   - No asynchronous calls with callbacks. This is the reason why several
> parts of the runtime poll the status, introducing unnecessary latency.
>
>   - No exception forwarding (many exceptions are simply swallowed), making
> debugging and operation in flaky environments very hard
>
>   - Limited number of handler threads. The RPC can only handle a fix number
> of concurrent requests, forcing you to maintain separate thread pools to
> delegate actions to
>
>   - No support for primitive data types (or boxed primitives) as arguments,
> everything has to be a specially serializable type
>
>   - Problematic threading model. The RPC continuously spawns and terminates
> threads
>
>
>
> Benefits of switching to the Akka actor model:
>
> -------------------------------------------------------------------------------
>
>   - Akka solves all of the above issues out of the box
>
>   - The supervisor model allows you to do failure detection of actors. That
> provides a unified way of detecting and handling failures (missing
> heartbeats, failed calls, ...)
>
>   - Akka has tools to make stateful actors persistent and restart them on
> other machines in cases of failure. That would greatly help in implementing
> "master fail-over", which will become important
>
>   - You can define many "call targets" (actors). Tasks (on taskmanagers)
> can directly call their ExecutionVertex on the JobManager, rather than
> calling the JobManager, creating a Runnable that looks up the execution
> vertex, and so on...
>
>   - The actor model's approach to queue actions on an actor and run the one
> after another makes the concurrency model of the state machine very simple
> and robust
>
>   - We "outsource" our own concerns about maintaining and improving that
> part of the system
>
> Greetings,
> Stephan
>

Re: Proposal: Refactor distributed coordination to use the Akka Actor Library

Reply via email to