kfaraz opened a new pull request, #16959:
URL: https://github.com/apache/druid/pull/16959
### Description
Coordinator logs are fairly noisy and don't give much useful information.
Pasted below is a log snippet for 1 minute from a test cluster during normal
operations.
Even when the Coordinator misbehaves, these logs are not very useful.
This patch reduces the level of some logs, completely removes some other
logs and adds
a new API for easier tracking of coordinator run status.
Coordinator already emits several metrics to monitor the run status.
None of the metrics are being modified in this patch as they are already
adequate.
```java
2024-08-25T10:41:20,765 INFO [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.duty.CompactSegments - Running
CompactSegments duty
2024-08-25T10:41:20,796 WARN [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.compact.DataSourceCompactibleSegmentIterator
- Skipping compaction for datasource[wiki-rollup] as it has no compactible
segments.
2024-08-25T10:41:20,804 INFO [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.duty.CompactSegments - Found [1] available
task slots for compaction out of max compaction task capacity [0]
2024-08-25T10:41:20,804 INFO [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.duty.CompactSegments - Submitted a total of
[0] compaction tasks.
2024-08-25T10:41:20,809 INFO [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Emitted [14] stats for
group [IndexingServiceDuties]. All collected stats:
Debug: 14 hidden stats. Set 'debugDimensions' to see these.
TOTAL: 14 stats for 4 dimension keys
2024-08-25T10:41:20,810 INFO [Coordinator-Exec-IndexingServiceDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Finished coordinator run
for group [IndexingServiceDuties] in [525] ms.
2024-08-25T10:41:50,735 INFO
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0]
org.apache.druid.metadata.SqlSegmentsMetadataManager - Starting polling of
segment and schema table.
2024-08-25T10:41:50,757 INFO
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0]
org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found [13]
segments and [8] schemas in the database in [22] ms.
2024-08-25T10:41:50,759 INFO
[org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0]
org.apache.druid.metadata.SqlSegmentsMetadataManager - Successfully created
snapshot from polled segments in [1] ms. Found [0] overshadowed segments.
2024-08-25T10:41:52,162 INFO [LookupCoordinatorManager--0]
org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating
lookups because no data exists
2024-08-25T10:42:12,419 INFO [DatabaseRuleManager-Exec--0]
org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found [2] rule(s)
for [2] datasource(s).
2024-08-25T10:42:20,284 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Starting coordinator run
for group [HistoricalManagementDuties]
2024-08-25T10:42:20,285 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Initialized run params
for group [HistoricalManagementDuties] with [13] used segments in [6]
datasources.
2024-08-25T10:42:20,285 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.loading.SegmentLoadingConfig - Smart
segment loading is enabled. Calculated replicationThrottleLimit[100] (5% of
used segments[13]) and numBalancerThreads[1].
2024-08-25T10:42:20,286 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.PrepareBalancerAndLoadQueues - Using
balancer strategy [CostBalancerStrategy] with [1] threads.
2024-08-25T10:42:20,287 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.RunRules - Applying retention rules on
[13] used segments, skipping [0] overshadowed segments.
2024-08-25T10:42:20,288 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.MarkOvershadowedSegmentsAsUnused -
Skipping MarkAsUnused until [PT15M] have elapsed after coordinator start
time[2024-08-25T10:39:50.266Z].
2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.BalanceSegments - Computed
maxSegmentsToMove[22] for total [22] segments on [3] historicals.
2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.BalanceSegments - Balancing segments
in tiers [[_default_tier]] with maxSegmentsToMove[22] and maxLifetime[60].
2024-08-25T10:42:20,289 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.balancer.SegmentToMoveCalculator - Need to
move [1] segments of avg size [0 MB] in tier[_default_tier] to fix disk usage
gap between min[0 GB][0.0%] and max[0 GB][0.0%].
2024-08-25T10:42:20,290 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.balancer.SegmentToMoveCalculator - Need to
move [1] segments in tier[_default_tier] to attain balance. Allowed values are
[min=22, max=22].
2024-08-25T10:42:20,297 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.balancer.TierSegmentBalancer - Moved [2 of
22] segments from [3] [active] servers in tier [_default_tier].
2024-08-25T10:42:20,297 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.duty.CollectSegmentAndServerStats -
Tier[_default_tier] is serving [22], loading [2] and dropping [0] segments
across [3] historicals with average usage[0 GBs], [0.0%].
2024-08-25T10:42:20,301 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Emitted [58] stats for
group [HistoricalManagementDuties]. All collected stats:
Debug: 60 hidden stats. Set 'debugDimensions' to see these.
TOTAL: 60 stats for 34 dimension keys
2024-08-25T10:42:20,301 INFO [Coordinator-Exec-HistoricalManagementDuties-0]
org.apache.druid.server.coordinator.DruidCoordinator - Finished coordinator run
for group [HistoricalManagementDuties] in [17] ms.
```
### Changes
- Add API `GET /druid/coordinator/v1/duties` that returns a status list of
all duty groups currently running on the Coordinator
-
### Sample API Response
### Logs after the patch
```java
```
### Release notes
Coordinator logs have been made less noisy.
New Coordinator API has been added to check the status of duties.
---
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]