Re: [PR] Add new balancer implementation called EnumeratedDistribution Balancer (druid)

via GitHub Thu, 25 Apr 2024 12:22:24 -0700


capistrant commented on code in PR #16335:
URL: https://github.com/apache/druid/pull/16335#discussion_r1580020266



##########
docs/configuration/index.md:
##########
@@ -860,25 +860,25 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 
 ##### Coordinator operation
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`druid.coordinator.period`|The run period for the Coordinator. The 
Coordinator operates by maintaining the current state of the world in memory 
and periodically looking at the set of "used" segments and segments being 
served to make decisions about whether any changes need to be made to the data 
topology. This property sets the delay between each of these runs.|`PT60S`|
-|`druid.coordinator.period.indexingPeriod`|How often to send 
compact/merge/conversion tasks to the indexing service. It's recommended to be 
longer than `druid.manager.segments.pollDuration`|`PT1800S` (30 mins)|
-|`druid.coordinator.startDelay`|The operation of the Coordinator works on the 
assumption that it has an up-to-date view of the state of the world when it 
runs, the current ZooKeeper interaction code, however, is written in a way that 
doesn’t allow the Coordinator to know for a fact that it’s done loading the 
current state of the world. This delay is a hack to give it enough time to 
believe that it has all the data.|`PT300S`|
-|`druid.coordinator.load.timeout`|The timeout duration for when the 
Coordinator assigns a segment to a Historical service.|`PT15M`|
-|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not 
the Coordinator clean up old entries in the `pendingSegments` table of metadata 
store. If set to true, Coordinator will check the created time of most recently 
complete task. If it doesn't exist, it finds the created time of the earliest 
running/pending/waiting tasks. Once the created time is found, then for all 
datasources not in the `killPendingSegmentsSkipList` (see [Dynamic 
configuration](#dynamic-configuration)), Coordinator will ask the Overlord to 
clean up the entries 1 day or more older than the found created time in the 
`pendingSegments` table. This will be done periodically based on 
`druid.coordinator.period.indexingPeriod` specified.|true|
-|`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator 
should submit kill task for unused segments, that is, permanently delete them 
from metadata store and deep storage. If set to true, then for all whitelisted 
datasources (or optionally all), Coordinator will submit tasks periodically 
based on `period` specified. A whitelist can be set via dynamic configuration 
`killDataSourceWhitelist` described later.<br /><br />When 
`druid.coordinator.kill.on` is true, segments are eligible for permanent 
deletion once their data intervals are older than 
`druid.coordinator.kill.durationToRetain` relative to the current time. If a 
segment's data interval is older than this threshold at the time it is marked 
unused, it is eligible for permanent deletion immediately after being marked 
unused.|false|
-|`druid.coordinator.kill.period`| The frequency of sending kill tasks to the 
indexing service. The value must be greater than or equal to 
`druid.coordinator.period.indexingPeriod`. Only applies if kill is turned 
on.|Same as `druid.coordinator.period.indexingPeriod`|
-|`druid.coordinator.kill.durationToRetain`|Only applies if you set 
`druid.coordinator.kill.on` to `true`. This value is ignored if 
`druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations 
must be a ISO8601 period. Druid will not kill unused segments whose interval 
end date is beyond `now - durationToRetain`. `durationToRetain` can be a 
negative ISO8601 period, which would result in `now - durationToRetain` to be 
in the future.<br /><br />Note that the `durationToRetain` parameter applies to 
the segment interval, not the time that the segment was last marked unused. For 
example, if `durationToRetain` is set to `P90D`, then a segment for a time 
chunk 90 days in the past is eligible for permanent deletion immediately after 
being marked unused.|`P90D`|
-|`druid.coordinator.kill.ignoreDurationToRetain`|A way to override 
`druid.coordinator.kill.durationToRetain` and tell the coordinator that you do 
not care about the end date of unused segment intervals when it comes to 
killing them. If true, the coordinator considers all unused segments as 
eligible to be killed.|false|
-|`druid.coordinator.kill.bufferPeriod`|The amount of time that a segment must 
be unused before it is able to be permanently removed from metadata and deep 
storage. This can serve as a buffer period to prevent data loss if data ends up 
being needed after being marked unused.|`P30D`|
-|`druid.coordinator.kill.maxSegments`|The number of unused segments to kill 
per kill task. This number must be greater than 0. This only applies when 
`druid.coordinator.kill.on=true`.|100|
-|`druid.coordinator.balancer.strategy`|Specify the type of balancing strategy 
for the Coordinator to use to distribute segments among the Historical 
services. `cachingCost` is logically equivalent to `cost` but is more 
CPU-efficient on large clusters. `diskNormalized` weights the costs according 
to the servers' disk usage ratios - there are known issues with this strategy 
distributing segments unevenly across the cluster. `random` distributes 
segments among services randomly.|`cost`|
-|`druid.coordinator.balancer.cachingCost.awaitInitialization`|Whether to wait 
for segment view initialization before creating the `cachingCost` balancing 
strategy. This property is enabled only when 
`druid.coordinator.balancer.strategy` is `cachingCost`. If set to true, the 
Coordinator will not start to assign segments, until the segment view is 
initialized. If set to false, the Coordinator will fallback to use the `cost` 
balancing strategy only if the segment view is not initialized yet. It may take 
much time to wait for the initialization since the `cachingCost` balancing 
strategy involves much computing to build itself.|false|
-|`druid.coordinator.loadqueuepeon.repeatDelay`|The start and repeat delay for 
the `loadqueuepeon`, which manages the load and drop of segments.|`PT0.050S` 
(50 ms)|
-|`druid.coordinator.asOverlord.enabled`|Boolean value for whether this 
Coordinator service should act like an Overlord as well. This configuration 
allows users to simplify a Druid cluster by not having to deploy any standalone 
Overlord services. If set to true, then Overlord console is available at 
`http://coordinator-host:port/console.html` and be sure to set 
`druid.coordinator.asOverlord.overlordService` also.|false|
-|`druid.coordinator.asOverlord.overlordService`| Required, if 
`druid.coordinator.asOverlord.enabled` is `true`. This must be same value as 
`druid.service` on standalone Overlord services and 
`druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
-|`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling 
datasource schema building on the Coordinator. Note, when using MiddleManager 
to launch task, set 
`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled` in 
MiddleManager runtime config. |false|
+| Property                                                     | Description   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                               | Default                        
                   |

Review Comment:
   My intelliJ kept enforcing this table formatting. I'm trying to figure out 
if that is set by the projects formatting policies and is retroactively 
cleaning up tables when they are modified, or if it is something I can prevent 
from happening so this diff is not so ridiculous looking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add new balancer implementation called EnumeratedDistribution Balancer (druid)

Reply via email to