Re: Replacing JobManager with Scala implementation

Ufuk Celebi Wed, 03 Sep 2014 15:48:07 -0700

Hey Daniel,

On Wed, Sep 3, 2014 at 11:48 PM, Daniel Warneke <[email protected]> wrote:


> quite frankly, I still don’t understand what concrete problems in Flink we
> are trying to solve with introducing akka, or even worse, reimplementing
> the JobManager and TaskManager in Scala. In my opinion, it is crucial to
> clarify that before the vote starts.
>

I think we are all on the same page and Till started this thread to clarify
the issues. It's unfortunate that we are having two interdependent
discussions in one thread though.

 1. Akka: The initial (orthogonal) issue, which initiated this thread is
the question whether to replace our current RPC system with Akka (
https://issues.apache.org/jira/browse/FLINK-1019). I think that the points
given by both Till and Stephan about Akka in this thread are valid
technical reasons for a transition. They highlight both problems with the
current RPC (threading, exception handling) and potential free lunches
(heartbeat, supervision).


> First, it is unclear to me why akka has such a strong standing in the
> project that we are seriously contemplating if it is worth to introduce the
> complexity of a second programming language to the very core of the system.
> An RPC service is a total commodity component these days. Any other RPC
> service could essentially do the job. Did somebody have a look at the
> alternatives (kryo, Netty, …)?
>

I'm not sure if Till or Stephan considered alternatives, but given the
above points Akka seems to be a good fit. Regarding Netty and Kryo:

- We have replaced the custom TCP network code of the system with Netty
some time ago and from my experience with Netty I don't think that it's a
good fit for replacing the RPC service as it won't solve the low-level
issues we are having right now. Instead it would just wrap them in a nice
library and add complexity for message queing etc.

- Kryo is imo just a serialization framework and KryoNet would be a
competitor to Netty and not Akka.

Second, I think it is also a misconception to think that the current RPC
> service is a major source of scalability and latency issues. Most of the
> scalability/latency problems we see arise from the currently rather
> complex/clumsy way of traversing Flink’s internal scheduling structures
> (i.e. the ExecutionGraph) upon status updates. The scheduling structure is
> inherently shared state at the moment, so unless somebody wants to
> reimplement it using actors and message passing, I don’t see how either
> akka or Scala could help us here.
>

I think we have already replaced parts of the ExecutionGraph lookup
structures to counter these problems. Stephan is currently working on
reworking the scheduler (https://issues.apache.org/jira/browse/FLINK-1030)
and if I'm not mistaken he has plans to make use of the Actor model for
this.

2. Scala: In order to make use of some Akka provides, the JobManager and
TaskManager need to be refactored to be Akka Actors. This refactoring can
be done in Java or Scala. The original question of this thread is whether
it would be OK to do it in Scala. Points both in favor and against this
have been raised here and led to the corresponding [VOTE] thread. Since the
[VOTE] is currently running I would suggest to move the discussion for this
specific there.

Re: Replacing JobManager with Scala implementation

Reply via email to