GitHub user StephanEwen opened a pull request:
https://github.com/apache/incubator-flink/pull/122
Major Rework of JobManager scheduling classes
The new design separates scheduler, instance manager, and execution graph
cleanly.
- Job graphs can be incrementally constructed
- The execution graph is a slim asynchronous state machine
- The scheduler picks instances and slots on demand and uses sharing
groups / co-location constraints to implement resource sharing and co location
- The management graph is removed, timestamps for state transitions are
attached to the execution graph
This pull request fixes also the following issues
- FLINK-1094 (simplify input split assignment)
- FLINK-1030 (Cleanly separate the Instance Management and Resource
Assignment)
- FLINK-1029 (Attach slot information to scheduling/assignment information
in Scheduler)
- FLINK-989 (improve not enough slots SchedulingException message)
- FLINK-897 (Bug in scheduler for job graph instance sharing)
- FLINK-820 (Support for unconnected data flows - runtime parts)
- FLINK-625 (Add a fail(Exception) method to the job graph, to report
problems detected in RPC calls)
- FLINK-230 (Job Cancellation does not work properly: "Cannot find
execution graph to job ID")
- FLINK-165 (rework Nephele Task Hierarchy)
- FLINK-79 (Startup time of jobs)
- FLINK-15 (Rework Nephele Execution Graph State Machine)
- FLINK-13 (Bug in Nephele Scheduler)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink jm_rework
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-flink/pull/122.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #122
----
commit b970b384f32ef2700df1ee66e208b13864cfeaf6
Author: Stephan Ewen <[email protected]>
Date: 2014-07-15T17:08:48Z
[FLINK-989] Improve error message when not finding enough task slots.
commit 213d84ba411beea1e1d5a4e0a10c76b259b98615
Author: Stephan Ewen <[email protected]>
Date: 2014-07-16T14:09:14Z
[FLINK-1030] Refactor and clean up instance managers.
commit 29ad0e56c60bddf45e79275b75de35288e9a23df
Author: Stephan Ewen <[email protected]>
Date: 2014-07-18T02:22:00Z
[FLINK-1094] Reworked, improved, and testes split assigners
commit c9df0e791773a0b5d6149120176fa17e41cc75ea
Author: Stephan Ewen <[email protected]>
Date: 2014-06-22T17:05:02Z
Unify all job vertices to one type (rather than dedicated input/output
types)
commit a823010f9c05b8fd15a8ffb5621cf25a9902924a
Author: Stephan Ewen <[email protected]>
Date: 2014-06-23T17:44:18Z
Make IDs immutable and serializable.
commit 9281c06915cb91e4a47ce6c6263a175a7a05766a
Author: Stephan Ewen <[email protected]>
Date: 2014-06-30T15:12:10Z
Stubs for intermediate data set and related classes
commit 407b8bc1483044eb14488c7145c5d9c186adea41
Author: Stephan Ewen <[email protected]>
Date: 2014-07-20T11:11:23Z
Redesign Scheduler from pre-assignment to more flexible schedule-on-demand
model
commit 1a38c2be2aac004f4e07a20727c03be8f70f223f
Author: Stephan Ewen <[email protected]>
Date: 2014-07-21T17:09:26Z
Adapt RPC to support primitive types as parameters and return values.
commit d40c5683b41f14f677e7ac9ca161852e63700a2e
Author: Stephan Ewen <[email protected]>
Date: 2014-07-25T01:09:49Z
Redesign Scheduler part 2
commit b6821068322d1cfa7e4d7de503c5faad88a7bc96
Author: Stephan Ewen <[email protected]>
Date: 2014-06-30T19:16:15Z
Refactor job graph construction to incremental attachment based
commit fbfd3146788665d5424e91460d99893e2503da16
Author: Stephan Ewen <[email protected]>
Date: 2014-09-08T21:29:42Z
Remove management graph and simplify historic job status
commit 8fe71097c93e0a237a3478476ae02c3b4b0e1f3f
Author: Stephan Ewen <[email protected]>
Date: 2014-09-09T19:12:20Z
Introduce execution attempts at execution vertex.
Add tests for job event classes
commit 8ea25284b969aeb1c2edcf83ed7c7a345c49135f
Author: Stephan Ewen <[email protected]>
Date: 2014-09-11T05:18:56Z
Finalize ExecutionGraph state machine and calls
commit af905ff505528dd686b7a63dc9acda92041db9b9
Author: Stephan Ewen <[email protected]>
Date: 2014-09-11T14:31:50Z
Adjust ExecutionGraph state machine to TaskManager's failing model (direct
transitions to canceled)
commit 5813241a54981984e88a6e91e40c0b23a2bd69d8
Author: Stephan Ewen <[email protected]>
Date: 2014-09-12T12:57:54Z
Adjusted job graph generator to new job graph classes
commit 13ebcfb9974724d76b288c8531641c7fdfff5cd9
Author: Stephan Ewen <[email protected]>
Date: 2014-09-12T16:00:08Z
Adjust test logging for new execution graph tests to logback framework
commit bda267ce9aacd597a710adc5ebe5d1126aea0481
Author: Stephan Ewen <[email protected]>
Date: 2014-09-14T19:23:28Z
Add proper locality of scheduling tracking to scheduler. Add local
scheduling to slot sharing groups.
commit af921858c03b61dfa70d2c8a74e21498c3eeb086
Author: Stephan Ewen <[email protected]>
Date: 2014-09-14T22:39:00Z
Add options strict co-location constraints to scheduler
commit 90ca25ddaed7f8b1edff4ee1f0f7304a1b4e8ddc
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T00:43:18Z
More graceful failing/errors/logging when canceling in early job stages
commit d61cb78106476dd986440a397d7ac915de577cec
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T01:15:33Z
Fix buffer leak in TaskManager / test tasks
commit 16a9fddb083ca180321a4daff1fbae1a748bf5f5
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T01:45:01Z
Fix logging in EventCollector
Fix comparisons (null pointer safe) in JobManagerITCase
commit 2f0d7b042cb066d405921ba499d7ab0b741c1af7
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T14:51:12Z
Better error messages at TaskManager startup and registration
commit 2fff545f8b1398e8d19209d036dee02a945983ca
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T14:52:48Z
Fix race at TaskManager registration during startup
commit c8c64ef7c461e683ed78c45e95fca083eb0ab9bd
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T14:54:14Z
Fix serializability for SlotSharingGroup
commit 6f53653a17717b986b7a0131036ed189a279b33d
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T14:54:38Z
Adjust tests to new JobGraphModel
commit b9c03d5df5477f621802a938bcdb618d42acab87
Author: Stephan Ewen <[email protected]>
Date: 2014-09-15T19:15:07Z
Add co-location-constraints as a special case of slot-shared scheduling
commit 372b2ac2bd572da37f5b1146b36d19f881db5da8
Author: Stephan Ewen <[email protected]>
Date: 2014-09-16T17:48:02Z
Fix bug in topological sort
commit dc58aefb3a1003fbfef86f28864bb787e76ee64e
Author: Stephan Ewen <[email protected]>
Date: 2014-09-16T22:02:19Z
Port streaming package to new JobGraph API and adjust all runtime-level
tests
commit 62d9b3c089c87703229746c5ebe2056d42e796d0
Author: Stephan Ewen <[email protected]>
Date: 2014-09-16T22:03:00Z
Improved distribution of IDs. Previous implementation lost bits due to
double-to-long multiplication and rounding.
commit 7d01f8640634af0d7caa67070e6cabca6dcb91d7
Author: Stephan Ewen <[email protected]>
Date: 2014-09-16T22:04:28Z
Fix failure exception message report back to client
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---