[ 
https://issues.apache.org/jira/browse/SLIDER-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012184#comment-15012184
 ] 

MENG DING commented on SLIDER-938:
----------------------------------

I apologize for the very late follow up, as I got sidetracked onto some other 
tasks.

I have done a prototype of the container resize feature in Slider. The 
following is my proposal up for discussion:

h3. Overall Strategy
Provide Java API and CLI to allow changing resource of a running container.
To [~ste...@apache.org]: I don't quite understand option #2 you proposed. Do 
you mean manual update of resources.json, followed by a slider update? If so, 
that cannot change the resource of a specific container that is still running, 
can it?

h3. User Interface
h5. CLI
Two options to consider for CLI:
# Extend the existing *slider flex*:
{{slider flex <application> --id <id> --memory <memory> --vcores <vcores>}}
# Add a new CLI:
{{slider resize-container <application> --id <id> --memory <memory> --vcores 
<vcores>}}

The {{id}} option is mandatory. At least one resource dimension must be 
specified (i.e., {{memory}} and/or {{vcores}}). The non-specified dimension 
will be unchanged. If {{vcores}} is specified, the underlying YARN cluster must 
have *DominantResourceCalculator* configured.

For example, {{slider resize-container memcached --id 
container_e17_1447786590303_0002_01_000002 --memory 3072}} instructs slider to 
change the memory allocation of container 
container_e17_1447786590303_0002_01_000002 from the memcached application to 
3072 , while keeping the number of vcores of that container unchanged.

Option 2 is preferred IMO, as it is cleaner and less error-prone. If we agree 
on this option, the following API, protocol changes will all be based on this 
option. If not, I will update the design once we reach agreement on other 
options.

h5. API
{code:title=org.apache.slider.client.SliderClientAPI}
  /**
   * Change resource of a specific running container in the cluster
   * @param name cluster name
   * @param args arguments
   * @return exit code
   * @throws YarnException
   * @throws IOException
   */
  int actionResizeContainer(String name, ActionResizeContainerArgs args)
      throws YarnException, IOException;
{code}

h3. Protocol Changes between Slider client and Slider AppMaster
{code:title=SliderClusterProtocol.proto}
service SliderClusterProtocolPB {
  ...
  /**
   * change resource of a container
   */
  rpc resizeContainer(ResizeContainerRequestProto)
    returns(ResizeContainerResponseProto);
  ...
}
{code}
{code:title=SliderClusterMessages.proto}
...
message ResourceProto {
  optional int32 memory = 1;
  optional int32 virtual_cores = 2;
}

message ResizeContainerRequestProto {
  optional string id = 1;
  optional ResourceProto resource = 2;
}

message ResizeContainerResponseProto {
  optional bool success = 1;
}
...
{code}

Different than YARN, Slider commits the auto-generated code from protobuf into 
git. Once the protocol change is approved, I will compile the code once with 
the {{-Pcompile-protobuf}} option, and generate the following files to be 
committed:
{{slider-core/src/main/java/org/apache/slider/api/proto/Messages.java}}
{{slider-core/src/main/java/org/apache/slider/api/proto/SliderClusterAPI.java}}

h3. Slider AppMaster Changes
# Currently {{SliderAppMaster}} directly implements both 
{{AMRMClientAsync.CallbackHandler}} and {{NMClientAsync.CallbackHandler}} 
interfaces, which have been deprecated in YARN-1509 and YARN-1510, and replaced 
by two abstract classes:  {{AMRMClientAsync.AbstractCallbackHandler}} and 
{{NMClientAsync.AbstractCallbackHandler}}. Since we cannot extend multiple 
classes, we need to create nested classes inside {{SliderAppMaster}} to handle 
callbacks, similar to the implementation in the {{DistributedShell}} example in 
YARN:
{code}
SliderAppMaster.RMCallbackHandler extends 
AMRMClientAsync.AbstractCallbackHandler
SliderAppMaster.NMCallbackHandler extends NMClientAsync.AbstractCallbackHandler
{code}
This means, of course, that the code must be compiled against Hadoop 2.8+
# The new {{AMRMClientAsync.requestContainerResourceChange}} and 
{{NMClientAsync.increaseContainerResource}} APIs require that the application 
master keeps track of all allocated containers, and passes the containers as 
parameters when calling these functions. The good thing is that Slider already 
tracks these containers in {{RoleInstance.container}} object.
# The high-level calling sequence:
* Sending request:
{code}
SliderIPCService.resizeContainer --> queue(ActionRequestContainerResize) --> 
... --> SliderAppMaster.requestContainerResourceChange --> 
AsyncRMOperationHandler.requestContainerResourceChange --> 
AMRMClientAsync.requestContainerResourceChange
{code}
* Processing response:
{code}
SliderAppMaster.RMCallbackHandler.onContainersResourceChanged --> 
AppState.onContainersResourceChanged --> queue(ActionIncreaseContainerResource) 
--> SliderAppMaster.increaseContainerResource --> 
NMClientAsync.increaseContainerResource
{code}

h3. Other Considerations
# Unlike flex, container resize information cannot be persisted for application 
restart. The current Slider framework only persist instance location through 
RoleHistory. It would be rather complicated to persist memory allocation 
information for restart of *each* role instance. This can be left for future 
enhancement if needed.
# REST API support:
* Show real-time container resource allocation information: Currently, the 
{{slideram/stats}} endpoint already provides a link to individual Container 
which shows the effective memory allocation of the container. Note: this is 
different than the {{Role Options:yarn.memory}} information displayed in the 
{{slideram/stats}} endpoint, which only shows the static value specified in the 
resources.json
* Container resize through REST API: This will be supported at a later time 
when SLIDER-151 is completed.

> Add ability to resize containers (Hadoop 2.8+)
> ----------------------------------------------
>
>                 Key: SLIDER-938
>                 URL: https://issues.apache.org/jira/browse/SLIDER-938
>             Project: Slider
>          Issue Type: New Feature
>          Components: appmaster
>            Reporter: Steve Loughran
>            Assignee: MENG DING
>              Labels: hadoop-2.8
>
> Hadoop 2.8 will add container resize in YARN-1197: support that for dynamic 
> container resize



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to