mynameborat commented on a change in pull request #1442:
URL: https://github.com/apache/samza/pull/1442#discussion_r528932983
##########
File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
##########
@@ -324,6 +324,9 @@ Samza supports both standalone and clustered
([YARN](yarn-jobs.html)) [deploymen
|job.container.count|1|The number of YARN containers to request for running
your job. This is the main parameter for controlling the scale (allocated
computing resources) of your job: to increase the parallelism of processing,
you need to increase the number of containers. The minimum is one container,
and the maximum number of containers is the number of task instances (usually
the number of input stream partitions). Task instances are evenly distributed
across the number of containers that you specify.|
|cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to
request from the cluster manager per container of your job. Along with
cluster-manager.container.cpu.cores, this property determines how many
containers the cluster manager will run on one machine. If the container
exceeds this limit, it will be killed, so it is important that the container's
actual memory use remains below the limit. The amount of memory used is
normally the JVM heap size (configured with task.opts), plus the size of any
off-heap memory allocation (for example stores.*.container.cache.size.bytes),
plus a safety margin to allow for JVM overheads.|
|cluster-manager.container.cpu.cores|1|The number of CPU cores to request per
container of your job. Each node in the cluster has a certain number of CPU
cores available, so this number (along with
cluster-manager.container.memory.mb) determines how many containers can be run
on one machine.|
+|job.coordinator.high-availability.enabled|false|If true, enables Job
Coordinator (AM) high availability (HA) where a new AM can establish connection
with already running containers.
+|job.coordinator.dynamic-heartbeat.retry.count|5|If AM-HA is enabled, when a
running container loses heartbeat with AM, this count gives the number of times
an already running container will attempt to establish heartbeat with new AM|
+|job.coordinator.dynamic-heartbeat.reconnect-sleep-duration.ms|10000|If AM-HA
is enabled, when a running container loses heartbeat with AM, this duration
gives the amount of time a running container will sleep between attempts to
establish heartbeat with new AM.|
Review comment:
For bookkeeping, while container heartbeat can serve as a generic model
for interaction between job coordinator and container, the contracts today
aren't generic enough and are tied into YARN specific implementation.
Standalone, for e.g. uses different mechanism for container/stream processor
membership w/ quorum and disconnects. For now, it will be good to move this to
YARN specific namespace and revisit it later when we have defined clear
contracts for heartbeat as a concept (agnostic to underlying cluster runtime)
between container and job coordinator
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]