samarthjain commented on a change in pull request #7088: Improve parallelism of
zookeeper based segment change processing
URL: https://github.com/apache/incubator-druid/pull/7088#discussion_r267569298
##########
File path: docs/content/configuration/index.md
##########
@@ -1251,7 +1251,11 @@ These Historical configurations can be defined in the
`historical/runtime.proper
|`druid.segmentCache.infoDir`|Historical nodes keep track of the segments they
are serving so that when the process is restarted they can reload the same
segments without waiting for the Coordinator to reassign. This path defines
where this metadata is kept. Directory will be created if
needed.|${first_location}/info_dir|
|`druid.segmentCache.announceIntervalMillis`|How frequently to announce
segments while segments are loading from cache. Set this value to zero to wait
for all segments to be loaded before announcing.|5000 (5 seconds)|
|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load
concurrently from from deep storage.|10|
-|`druid.segmentCache.numBootstrapThreads`|How many segments to load
concurrently from local storage at startup.|Same as numLoadingThreads|
+|`druid.coordinator.loadqueuepeon.curator.numCreateThreads`|Number of threads
creating zk nodes corresponding to segments that need to be loaded or
dropped.|10|
+|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of
threads for executing callback actions associated with loading or dropping of
segments.|2|
+|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads
to use for monitoring deletion of zk nodes|1|
+|`druid.coordinator.curator.create.zknode.batchSize`|Number of zk nodes to
create in one iteration.|5000|
+|`druid.coordinator.curator.create.zknode.repeatDelay`|Delay before creating
next batch of zk nodes|PT1M|
Review comment:
Good question, @egor-ryashin . I ended up following the pattern we have for
http based segment loading. The general idea was to allow historicals long
enough to load the segment before coordinator tries to recreate the zookeeper
node for it. Keep in mind that segment loading is not purely I/O. It also
involves decompressing the segment files and mapping them to off-heap memory.
But yes, the general formula to come up with the numbers is what you suggested:
```(batch_size * average segment_size)/delay <= factor *
node_network_throughput```
IMHO, having separate parameters for batch size and delay makes more sense
since it would force users to think more about how values to set instead of
just setting a number like node_network_throughput. I can also update the
documentation with the above formula to aid users. Of course, most of the users
won't be configuring this. So having sane default values is important.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]