[GitHub] [incubator-druid] egor-ryashin commented on a change in pull request #7088: Improve parallelism of zookeeper based segment change processing

GitBox Sun, 24 Mar 2019 12:22:47 -0700

egor-ryashin commented on a change in pull request #7088: Improve parallelism 
of zookeeper based segment change processing
URL: https://github.com/apache/incubator-druid/pull/7088#discussion_r268448365


 ##########
 File path: docs/content/configuration/index.md
 ##########
 @@ -1251,7 +1251,11 @@ These Historical configurations can be defined in the 
`historical/runtime.proper
 |`druid.segmentCache.infoDir`|Historical nodes keep track of the segments they 
are serving so that when the process is restarted they can reload the same 
segments without waiting for the Coordinator to reassign. This path defines 
where this metadata is kept. Directory will be created if 
needed.|${first_location}/info_dir|
 |`druid.segmentCache.announceIntervalMillis`|How frequently to announce 
segments while segments are loading from cache. Set this value to zero to wait 
for all segments to be loaded before announcing.|5000 (5 seconds)|
 |`druid.segmentCache.numLoadingThreads`|How many segments to drop or load 
concurrently from from deep storage.|10|
-|`druid.segmentCache.numBootstrapThreads`|How many segments to load 
concurrently from local storage at startup.|Same as numLoadingThreads|
+|`druid.coordinator.loadqueuepeon.curator.numCreateThreads`|Number of threads 
creating zk nodes corresponding to segments that need to be loaded or 
dropped.|10|
+|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of 
threads for executing callback actions associated with loading or dropping of 
segments.|2|
+|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads 
to use for monitoring deletion of zk nodes|1|
+|`druid.coordinator.curator.create.zknode.batchSize`|Number of zk nodes to 
create in one iteration.|5000|
+|`druid.coordinator.curator.create.zknode.repeatDelay`|Delay before creating 
next batch of zk nodes|PT1M|
 
 Review comment:
   The problem with the formula is that `avg_segment_size` can vary over time, 
I used the formula merely to find out if we understand each other well. 
   > having separate parameters for batch size and delay makes more sense since 
it would force users to think more about how values to set instead of just 
setting a number like node_network_throughput.
   
   The thing is, `batch size` and `delay` indirectly depend on the data being 
processed, if segments become larger the Druid user should somehow calculate 
the average segment size then reconfigure the Druid cluster and hope that the 
average is not skewed in some way, while having a `max_network_throughput` 
parameter the user can configure once and forget.
   
   Should I say that the Druid user has to know `max_network_throughput` using 
both aforementioned implementations, but with the second imlementation the user 
has to do much less maintenance work.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] egor-ryashin commented on a change in pull request #7088: Improve parallelism of zookeeper based segment change processing

Reply via email to