Nico Kruber created FLINK-6046:
----------------------------------

             Summary: Add support for oversized messages during deployment
                 Key: FLINK-6046
                 URL: https://issues.apache.org/jira/browse/FLINK-6046
             Project: Flink
          Issue Type: New Feature
          Components: Distributed Coordination
            Reporter: Nico Kruber
            Assignee: Nico Kruber


This is the non-FLIP6 version of FLINK-4346, restricted to deployment messages:

Currently, messages larger than the maximum Akka Framesize cause an error when 
being transported. We should add a way to pass messages that are larger than 
{{akka.framesize}} as may happen for task deployments via the 
{{TaskDeploymentDescriptor}}.

We should use the {{BlobServer}} to offload big data items (if possible) and 
make use of any potential distributed file system behind. This way, not only do 
we avoid the akka framesize restriction, but may also be able to speed up 
deployment.

I suggest the following changes:
  - the sender, i.e. the {{Execution}} class, tries to store the serialized job 
information and serialized task information (if oversized) from the 
{{TaskDeploymentDescriptor}} (tdd) on the {{BlobServer}} as a single 
{{NAME_ADDRESSABLE}} blob under its job ID (if this does not work, we send the 
whole tdd as usual via akka)
  - if stored in a blob, these data items are removed from the tdd
  - the receiver, i.e. the {{TaskManager}} class, tries to retrieve any 
offloaded data after receiving the {{TaskDeploymentDescriptor}} from akka; it 
re-assembles the original tdd
  - as all {{NAME_ADDRESSABLE}} blobs, these offloaded blobs are removed when 
the job enters a final state

Further (future) changes may include:
  - separating the serialized job information and serialized task information 
into two files and re-use the first one for all tasks
  - not re-deploying these two during job recovery (if possible)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to