samarthjain commented on a change in pull request #7088: Improve parallelism of
zookeeper based segment change processing
URL: https://github.com/apache/incubator-druid/pull/7088#discussion_r277127360
##########
File path: docs/content/configuration/index.md
##########
@@ -1254,9 +1254,9 @@ These Historical configurations can be defined in the
`historical/runtime.proper
|`druid.segmentCache.dropSegmentDelayMillis`|How long a process delays before
completely dropping segment.|30000 (30 seconds)|
|`druid.segmentCache.infoDir`|Historical processes keep track of the segments
they are serving so that when the process is restarted they can reload the same
segments without waiting for the Coordinator to reassign. This path defines
where this metadata is kept. Directory will be created if
needed.|${first_location}/info_dir|
|`druid.segmentCache.announceIntervalMillis`|How frequently to announce
segments while segments are loading from cache. Set this value to zero to wait
for all segments to be loaded before announcing.|5000 (5 seconds)|
-|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load
concurrently from from deep storage.|10|
-|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of
threads for executing callback actions associated with loading or dropping of
segments.|2|
-|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads
to use for monitoring deletion of zk nodes|1|
+|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load
concurrently from deep storage. Note that the work of loading segments involves
downloading segments from deep storage, decompressing them and loading them to
a memory mapped location. So the work is not all I/O Bound. Depending on CPU
and network load, one could possibly increase this config to a higher
value.|Number of cores|
+|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of
threads for executing callback actions associated with loading or dropping of
segments. One might want to increase this number when noticing clusters are
lagging behind w.r.t. balancing segments across historical nodes.|2|
+|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads
to use for monitoring deletion of zk nodes. Tasks in this pool get scheduled to
run at time `druid.coordinator.load.timeout` after a segment is added to the
queue. Increase this number if you see segments not getting loaded or dropped
even after `druid.coordinator.load.timeout` since it is possible they are not
getting re-assigned to the queue of other historicals soon enough.|1|
Review comment:
To be honest, this is a fairly advanced setting and the operator would need
to know the nitty-gritty details of segment assignment and loading.
The zookeeper node created for processing a segment load/drop should be
deleted within `druid.coordinator.load.timeout`. If the node doesn't get
deleted, then it means the historical failed to process the request. When such
a timeout happens, queue peon needs to mark the change request as failed and
effectively tell the coordinator to assign this change request to another
historical. With several concurrent change requests in flight, it is possible
that such reassignment may not run fast enough. Such a scenario though is very
unlikely which is why the default number of threads is 1.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]