[ 
https://issues.apache.org/jira/browse/FLINK-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032147#comment-14032147
 ] 

Stephan Ewen commented on FLINK-939:
------------------------------------

Daniel, I'd be very happy to have you on board for this :-) The current state 
of the discussion is the following:

We plan to switch from the custom (Hadoop inspired) RPC to Akka. The reason is 
that we want a fast RPC that works also asynchronously (with futures) in order 
to get away from the polling that happens at several places. The polling 
latencies in the client and when asking for a remote endpoint address that is 
to be lazily deployed currently eat up most of the local execution times in our 
tests. Akka seems to be a good fit for that. The actor system also does the 
heartbeats between different nodes and allows you to listen for failures and 
delays.

Asterios Katsifodimos ([email protected], 
https://github.com/asteriosk) has been working on this the past days/weeks.

The restriction in akka is the maximum frame size of messages. We are looking 
into different options to get around that. A "download" service for large blobs 
is one option. I personally would like to avoid a DFS dependency, because that 
would mean more configuration (currently it runs very nicely out of the box) 
and more latency (which we are trying to get down at the moment).



> Distribute required JAR files with seperate service
> ---------------------------------------------------
>
>                 Key: FLINK-939
>                 URL: https://issues.apache.org/jira/browse/FLINK-939
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Ufuk Celebi
>            Assignee: Daniel Warneke
>
> Currently, required user JAR files are distributed via the RPC service in 
> {{JobGraph.writeRequiredJarFiles(DataOutput, AbstractJobVertex[])}}. The RPC 
> service then tries to allocate a buffer on the client side heap to write the 
> on-disk JAR, which [can lead to 
> problems|https://github.com/apache/incubator-flink/pull/18].
> This should be replaced with a seperate service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to