[ 
https://issues.apache.org/jira/browse/FLINK-20364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239052#comment-17239052
 ] 

Guruh Fajar Samudra commented on FLINK-20364:
---------------------------------------------

GitHub user tillrohrmann opened a pull request:

[https://github.com/apache/flink/pull/5091]

FLINK-7956 [flip6] Add support for queued scheduling with slot sharing to 
SlotPool
 # 
 ## What is the purpose of the change

This commit adds support for queued scheduling with slot sharing to the
SlotPool. The idea of slot sharing is that multiple tasks can run in the
same slot. Moreover, queued scheduling means that a slot request must not
be completed right away but at a later point in time. This allows to
start new TaskExecutors in case that there are no more slots left.

The main component responsible for the management of shared slots is the
SlotSharingManager. The SlotSharingManager maintains internally a tree-like
structure which stores the SlotContext future of the underlying
AllocatedSlot. Whenever this future is completed potentially pending
LogicalSlot instantiations are executed and sent to the slot requester.

A shared slot is represented by a MultiTaskSlot which can harbour multiple
TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot.

In order to represent co-location constraints, we first obtain a root
MultiTaskSlot and then allocate a nested MultiTaskSlot in which the
co-located tasks are allocated. The corresponding SlotRequestID is assigned
to the CoLocationConstraint in order to make the TaskSlot retrievable for
other tasks assigned to the same CoLocationConstraint.

This PR also moves the `SlotPool` components to 
`o.a.f.runtime.jobmaster.slotpool`.

This PR is based on #5090
 # 
 ## Brief change log

 - Add `SlotSharingManager` to manage shared slots
 - Rework `SlotPool` to use `SlotSharingManager`
 - Add `SlotPool#allocateMultiTaskSlot` to allocate a shared slot
 - Add `SlotPool#allocateCoLocatedMultiTaskSlot` to allocate a co-located slot
 - Move `SlotPool` components to `o.a.f.runtime.jobmaster.slotpool`

 # 
 ## Verifying this change

 - Port `SchedulerSlotSharingTest`, `SchedulerIsolatedTasksTest` and
`ScheduleWithCoLocationHintTest` to run with `SlotPool`
 - Add `SlotSharingManagerTest`, `SlotPoolSlotSharingTest` and
`SlotPoolCoLocationTest`

 # 
 ## Does this pull request potentially affect one of the following parts:

 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
 - The S3 file system connector: (no)

 # 
 ## Documentation

 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)

CC: @GJL

You can merge this pull request into a Git repository by running:

$ git pull [https://github.com/tillrohrmann/flink] slotPoolSlots

Alternatively you can review and apply these changes as the patch at:

[https://github.com/apache/flink/pull/5091.patch]

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5091
----
commit d30dde83548dbeff4249f3b57b67cdb6247af510
Author: Till Rohrmann <[email protected]>
Date: 2017-11-14T22:50:52Z

FLINK-8078 Introduce LogicalSlot interface

The LogicalSlot interface decouples the task deployment from the actual
slot implementation which at the moment is Slot, SimpleSlot and SharedSlot.
This is a helpful step to introduce a different slot implementation for
Flip-6.

commit e5da9566a6fc8a36ac8b06bae911c0dff5554e5d
Author: Till Rohrmann <[email protected]>
Date: 2017-11-15T13:20:27Z

FLINK-8085 Thin out LogicalSlot interface

Remove isCanceled, isReleased method and decouple logical slot from Execution by
introducing a Payload interface which is set for a LogicalSlot. The Payload 
interface
is implemented by the Execution and allows to fail an implementation and 
obtaining
a termination future.

Introduce proper Execution#releaseFuture which is completed once the Execution's
assigned resource has been released.

commit 84d86bebe2f9f8395430e7c71dd2393ba117b44f
Author: Till Rohrmann <[email protected]>
Date: 2017-11-24T17:03:49Z

FLINK-8087 Decouple Slot from AllocatedSlot

This commit introduces the SlotContext which is an abstraction for the 
SimpleSlot
to obtain the relevant slot information to do the communication with the
TaskManager without relying on the AllocatedSlot which is now only used by the
SlotPool.

commit 80a3cc848a0c724a2bc09b1b967cc9e6ccec5942
Author: Till Rohrmann <[email protected]>
Date: 2017-11-24T17:06:10Z

FLINK-8088 Associate logical slots with the slot request id

Before logical slots like the SimpleSlot and SharedSlot where associated to the
actually allocated slot via the AllocationID. This, however, was sub-optimal 
because
allocated slots can be re-used to fulfill also other slot requests (logical 
slots).
Therefore, we should bind the logical slots to the right id with the right 
lifecycle
which is the slot request id.

commit 3e4550c0607744b20893dc90c587b63e68e4de1e
Author: Till Rohrmann <[email protected]>
Date: 2017-11-13T14:42:07Z

FLINK-8089 Also check for other pending slot requests in offerSlot

Not only check for a slot request with the right allocation id but also check
whether we can fulfill other pending slot requests with an unclaimed offered
slot before adding it to the list of available slots.

commit b04dda46aaf298d921929910574662970d9c5093
Author: Till Rohrmann <[email protected]>
Date: 2017-11-24T22:29:53Z

[hotfix] Speed up RecoveryITCase

commit e512558917f9bb5005024630b8a015cd624164b4
Author: Till Rohrmann <[email protected]>
Date: 2017-11-24T17:08:38Z

FLINK-7956 [flip6] Add support for queued scheduling with slot sharing to 
SlotPool

This commit adds support for queued scheduling with slot sharing to the
SlotPool. The idea of slot sharing is that multiple tasks can run in the
same slot. Moreover, queued scheduling means that a slot request must not
be completed right away but at a later point in time. This allows to
start new TaskExecutors in case that there are no more slots left.

The main component responsible for the management of shared slots is the
SlotSharingManager. The SlotSharingManager maintains internally a tree-like
structure which stores the SlotContext future of the underlying
AllocatedSlot. Whenever this future is completed potentially pending
LogicalSlot instantiations are executed and sent to the slot requester.

A shared slot is represented by a MultiTaskSlot which can harbour multiple
TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot.

In order to represent co-location constraints, we first obtain a root
MultiTaskSlot and then allocate a nested MultiTaskSlot in which the
co-located tasks are allocated. The corresponding SlotRequestID is assigned
to the CoLocationConstraint in order to make the TaskSlot retrievable for
other tasks assigned to the same CoLocationConstraint.

Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and
ScheduleWithCoLocationHintTest to run with SlotPool.

Restructure SlotPool components.

Add SlotSharingManagerTest, SlotPoolSlotSharingTest and
SlotPoolCoLocationTest.

commit 6489c6769a40b70f49b827784c810f954c413361
Author: Till Rohrmann <[email protected]>
Date: 2017-11-27T08:29:54Z

[hotfix] [tests] Speed up queryable state IT tests by removing sleep

> Add support for scheduling with slot sharing
> --------------------------------------------
>
>                 Key: FLINK-20364
>                 URL: https://issues.apache.org/jira/browse/FLINK-20364
>             Project: Flink
>          Issue Type: Test
>          Components: Runtime / Coordination
>    Affects Versions: statefun-2.2.1
>            Reporter: Guruh Fajar Samudra
>            Priority: Major
>             Fix For: statefun-2.2.2
>
>
> In order to reach feature equivalence with the old code base, we should add 
> support for scheduling with slot sharing to the SlotPool. This will also 
> allow us to run all the IT cases based on the {{AbstractTestBase}} on the 
> Flip-6 {{MiniCluster}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to