GitHub user kl0u opened a pull request:
https://github.com/apache/flink/pull/887
Collect(): Fixing the akka.framesize size limitation.
In Apache Flink the results of the collect() call were returned through
akka to the client. This led to an inherent limitation to the size of the
output of a job, as this could not exceed the akka.framesize size. In other
case, akka would drop the message.
To alleviate this, without dropping the benefits brought by akka and its
out-of-the-box efficiency for small-sized results, we decided to keep
forwarding the non-oversized (i.e. smaller than the akka.framesize) results
through akka, and use the BlobCache module for the forwarding the oversized
(large) ones.
Now the JobManager receives end merges the small accumulators (as before),
and simply forwards to the Client the keys to the blobs storing the oversized
ones. Now it is the responsibility of the Client to do the final merging
between oversized and non-oversized accumulators.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kl0u/flink collect_fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/887.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #887
----
commit f417e2585fda1aca936b8e0637618d44cd0b81ca
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-04T14:50:48Z
A first working version of collect() with unbounded Accumulator sizes.
commit bf52a091b0fbb04426fa61949334cc44c548d6c2
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-04T15:31:47Z
Cleaned up the TaskManaget side.
commit f0de184b0a3aac64bcaa753db0917778e031883e
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-04T18:54:08Z
Cleaned up till the JobManager side.
commit 10faf14c4df168da533a35fefa495c1b860ddf1d
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-04T22:56:09Z
Cleaned up the code. Missing the Stringified result.
commit 9cd35f46dcf5e6494196185621413ba793da0913
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T00:37:12Z
Fixed a version for the Stringified result.
commit e5787c74e48a9bed7c503e5d2e90c51b5f33d24f
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T01:45:38Z
Fixed a sanity check in the SerializedJobExecutionResult.
commit c36bab2c54f1e6a9f401be6eb1e9a75171342212
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T02:35:17Z
Fixed the cleaning up of the BlobCache after the end of the job.
commit 764f8bda9fbda58d3df7cac51f5b1b2c1cee10de
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T03:41:44Z
Fixed a test bug.
commit 1c1701a0bd8e4eef742d18494875176136f35233
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T14:52:06Z
Fixed a comment in the RuntimeEnvironment.
commit 1471bc22bd32675be91c96ec5e0e8ce884fc0bd0
Author: Kostas Kloudas <[email protected]>
Date: 2015-07-05T15:01:59Z
Fixed some method and class renaming.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---