[
https://issues.apache.org/jira/browse/FLINK-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032147#comment-14032147
]
Stephan Ewen commented on FLINK-939:
------------------------------------
Daniel, I'd be very happy to have you on board for this :-) The current state
of the discussion is the following:
We plan to switch from the custom (Hadoop inspired) RPC to Akka. The reason is
that we want a fast RPC that works also asynchronously (with futures) in order
to get away from the polling that happens at several places. The polling
latencies in the client and when asking for a remote endpoint address that is
to be lazily deployed currently eat up most of the local execution times in our
tests. Akka seems to be a good fit for that. The actor system also does the
heartbeats between different nodes and allows you to listen for failures and
delays.
Asterios Katsifodimos ([email protected],
https://github.com/asteriosk) has been working on this the past days/weeks.
The restriction in akka is the maximum frame size of messages. We are looking
into different options to get around that. A "download" service for large blobs
is one option. I personally would like to avoid a DFS dependency, because that
would mean more configuration (currently it runs very nicely out of the box)
and more latency (which we are trying to get down at the moment).
> Distribute required JAR files with seperate service
> ---------------------------------------------------
>
> Key: FLINK-939
> URL: https://issues.apache.org/jira/browse/FLINK-939
> Project: Flink
> Issue Type: Improvement
> Reporter: Ufuk Celebi
> Assignee: Daniel Warneke
>
> Currently, required user JAR files are distributed via the RPC service in
> {{JobGraph.writeRequiredJarFiles(DataOutput, AbstractJobVertex[])}}. The RPC
> service then tries to allocate a buffer on the client side heap to write the
> on-disk JAR, which [can lead to
> problems|https://github.com/apache/incubator-flink/pull/18].
> This should be replaced with a seperate service.
--
This message was sent by Atlassian JIRA
(v6.2#6252)