[GitHub] [samza] mynameborat commented on a change in pull request #1442: SAMZA-2602: Dynamic heartbeat establish with new AM

GitBox Mon, 23 Nov 2020 11:07:36 -0800


mynameborat commented on a change in pull request #1442:
URL: https://github.com/apache/samza/pull/1442#discussion_r528932983




##########
File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
##########
@@ -324,6 +324,9 @@ Samza supports both standalone and clustered 
([YARN](yarn-jobs.html)) [deploymen
 |job.container.count|1|The number of YARN containers to request for running 
your job. This is the main parameter for controlling the scale (allocated 
computing resources) of your job: to increase the parallelism of processing, 
you need to increase the number of containers. The minimum is one container, 
and the maximum number of containers is the number of task instances (usually 
the number of input stream partitions). Task instances are evenly distributed 
across the number of containers that you specify.|
 |cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to 
request from the cluster manager per container of your job. Along with 
cluster-manager.container.cpu.cores, this property determines how many 
containers the cluster manager will run on one machine. If the container 
exceeds this limit, it will be killed, so it is important that the container's 
actual memory use remains below the limit. The amount of memory used is 
normally the JVM heap size (configured with task.opts), plus the size of any 
off-heap memory allocation (for example stores.*.container.cache.size.bytes), 
plus a safety margin to allow for JVM overheads.|
 |cluster-manager.container.cpu.cores|1|The number of CPU cores to request per 
container of your job. Each node in the cluster has a certain number of CPU 
cores available, so this number (along with 
cluster-manager.container.memory.mb) determines how many containers can be run 
on one machine.|
+|job.coordinator.high-availability.enabled|false|If true, enables Job 
Coordinator (AM) high availability (HA) where a new AM can establish connection 
with already running containers.  
+|job.coordinator.dynamic-heartbeat.retry.count|5|If AM-HA is enabled, when a 
running container loses heartbeat with AM, this count gives the number of times 
an already running container will attempt to establish heartbeat with new AM|
+|job.coordinator.dynamic-heartbeat.reconnect-sleep-duration.ms|10000|If AM-HA 
is enabled, when a running container loses heartbeat with AM, this duration 
gives the amount of time a running container will sleep between attempts to 
establish heartbeat with new AM.|

Review comment:
       For bookkeeping, while container heartbeat can serve as a generic model 
for interaction between job coordinator and container, the contracts today 
aren't generic enough and are tied into YARN specific implementation.
   
   Standalone, for e.g. uses different mechanism for container/stream processor 
membership w/ quorum and disconnects. For now, it will be good to move this to 
YARN specific namespace and revisit it later when we have defined clear 
contracts for heartbeat as a concept (agnostic to underlying cluster runtime) 
between container and job coordinator




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [samza] mynameborat commented on a change in pull request #1442: SAMZA-2602: Dynamic heartbeat establish with new AM

Reply via email to