kfaraz opened a new pull request, #16959:
URL: https://github.com/apache/druid/pull/16959

   ### Description
   
   Coordinator logs are fairly noisy and don't give much useful information.
   Pasted below is a log snippet for 1 minute from a test cluster during normal 
operations.
   
   Even when the Coordinator misbehaves, these logs are not very useful.
   
   This patch reduces the level of some logs, completely removes some other 
logs and adds
   a new API for easier tracking of coordinator run status.
   
   Coordinator already emits several metrics to monitor the run status.
   None of the metrics are being modified in this patch as they are already 
adequate.
   
   ```java
   2024-08-25T10:41:20,765 INFO [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.duty.CompactSegments - Running 
CompactSegments duty
   2024-08-25T10:41:20,796 WARN [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.compact.DataSourceCompactibleSegmentIterator
 - Skipping compaction for datasource[wiki-rollup] as it has no compactible 
segments.
   2024-08-25T10:41:20,804 INFO [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.duty.CompactSegments - Found [1] available 
task slots for compaction out of max compaction task capacity [0]
   2024-08-25T10:41:20,804 INFO [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.duty.CompactSegments - Submitted a total of 
[0] compaction tasks.
   2024-08-25T10:41:20,809 INFO [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Emitted [14] stats for 
group [IndexingServiceDuties]. All collected stats:
   Debug: 14 hidden stats. Set 'debugDimensions' to see these.
   TOTAL: 14 stats for 4 dimension keys
   2024-08-25T10:41:20,810 INFO [Coordinator-Exec-IndexingServiceDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Finished coordinator run 
for group [IndexingServiceDuties] in [525] ms.
   
   2024-08-25T10:41:50,735 INFO 
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] 
org.apache.druid.metadata.SqlSegmentsMetadataManager - Starting polling of 
segment and schema table.
   2024-08-25T10:41:50,757 INFO 
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] 
org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found [13] 
segments and [8] schemas in the database in [22] ms.
   2024-08-25T10:41:50,759 INFO 
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] 
org.apache.druid.metadata.SqlSegmentsMetadataManager - Successfully created 
snapshot from polled segments in [1] ms. Found [0] overshadowed segments.
   2024-08-25T10:41:52,162 INFO [LookupCoordinatorManager--0] 
org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating 
lookups because no data exists
   2024-08-25T10:42:12,419 INFO [DatabaseRuleManager-Exec--0] 
org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found [2] rule(s) 
for [2] datasource(s).
   2024-08-25T10:42:20,284 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Starting coordinator run 
for group [HistoricalManagementDuties]
   2024-08-25T10:42:20,285 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Initialized run params 
for group [HistoricalManagementDuties] with [13] used segments in [6] 
datasources.
   2024-08-25T10:42:20,285 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.loading.SegmentLoadingConfig - Smart 
segment loading is enabled. Calculated replicationThrottleLimit[100] (5% of 
used segments[13]) and numBalancerThreads[1].
   2024-08-25T10:42:20,286 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.PrepareBalancerAndLoadQueues - Using 
balancer strategy [CostBalancerStrategy] with [1] threads.
   2024-08-25T10:42:20,287 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.RunRules - Applying retention rules on 
[13] used segments, skipping [0] overshadowed segments.
   2024-08-25T10:42:20,288 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.MarkOvershadowedSegmentsAsUnused - 
Skipping MarkAsUnused until [PT15M] have elapsed after coordinator start 
time[2024-08-25T10:39:50.266Z].
   2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.BalanceSegments - Computed 
maxSegmentsToMove[22] for total [22] segments on [3] historicals.
   2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.BalanceSegments - Balancing segments 
in tiers [[_default_tier]] with maxSegmentsToMove[22] and maxLifetime[60].
   2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.balancer.SegmentToMoveCalculator - Need to 
move [1] segments of avg size [0 MB] in tier[_default_tier] to fix disk usage 
gap between min[0 GB][0.0%] and max[0 GB][0.0%].
   2024-08-25T10:42:20,290 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.balancer.SegmentToMoveCalculator - Need to 
move [1] segments in tier[_default_tier] to attain balance. Allowed values are 
[min=22, max=22].
   2024-08-25T10:42:20,297 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.balancer.TierSegmentBalancer - Moved [2 of 
22] segments from [3] [active] servers in tier [_default_tier].
   2024-08-25T10:42:20,297 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.duty.CollectSegmentAndServerStats - 
Tier[_default_tier] is serving [22], loading [2] and dropping [0] segments 
across [3] historicals with average usage[0 GBs], [0.0%].
   2024-08-25T10:42:20,301 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Emitted [58] stats for 
group [HistoricalManagementDuties]. All collected stats:
   Debug: 60 hidden stats. Set 'debugDimensions' to see these.
   TOTAL: 60 stats for 34 dimension keys
   2024-08-25T10:42:20,301 INFO [Coordinator-Exec-HistoricalManagementDuties-0] 
org.apache.druid.server.coordinator.DruidCoordinator - Finished coordinator run 
for group [HistoricalManagementDuties] in [17] ms.
   ```
   
   ### Changes
   
   - Add API `GET /druid/coordinator/v1/duties` that returns a status list of 
all duty groups currently running on the Coordinator
   - 
   
   ### Sample API Response
   
   
   ### Logs after the patch
   
   ```java
   
   ```
   
   ### Release notes
   Coordinator logs have been made less noisy.
   New Coordinator API has been added to check the status of duties.
   
   ---
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to