GWphua opened a new pull request, #19118:
URL: https://github.com/apache/druid/pull/19118

   ### Description
   
   Found bug while debugging #19112 
   
   #### The bug 
   
   I am trying out MSQ compaction on MM-less architecture. After creating a 
supervisor, no tasks are created (This is still not fixed in this PR, somehow 
the Overlord is waiting for ages to do metadata cache sync). 
   
   The web console shows an error toast "Could not get compaction information". 
   
   The `GET /druid/indexer/v1/compaction/status/datasources` API on the Overlord
   throws a `NullPointerException` when compaction supervisors are enabled 
(MM-less
   ingestion) and the API is called before the compaction scheduler completes 
its
   first run.
   
   ```
   2026-03-10T02:47:53,278 ERROR [qtp1741034833-70] 
org.apache.druid.server.http.ServletResourceUtils - Error executing HTTP request
   java.lang.NullPointerException: Cannot invoke "java.util.Map.entrySet()" 
because "map" is null
        at java.base/java.util.Map.copyOf(Map.java:1734)
        at 
org.apache.druid.indexing.compact.OverlordCompactionScheduler.getAllCompactionSnapshots(OverlordCompactionScheduler.java:489)
        at 
org.apache.druid.indexing.overlord.http.OverlordCompactionResource.lambda$getAllCompactionSnapshots$1(OverlordCompactionResource.java:150)
   ...
   ```
   
   The root cause is that 
`OverlordCompactionScheduler.datasourceToCompactionSnapshot`
   is initialized as `new AtomicReference<>()` (i.e. `null`), and the method
   `getAllCompactionSnapshots()` calls 
`Map.copyOf(datasourceToCompactionSnapshot.get())`. The snapshot map is only 
populated after the first invocation of `resetCompactionJobQueue()`, which runs 
after the Overlord
   becomes leader and the scheduler fires its first cycle. 
   
   The same issue also affects `getCompactionSnapshot(dataSource)` when an 
active
   supervisor exists but the scheduler has not yet run.
   
   #### Release note
   <!-- Give your best effort to summarize your changes in a couple of 
sentences aimed toward Druid users. 
   
   If your change doesn't have end user impact, you can skip this section.
   
   For tips about how to write a good release note, see [Release 
notes](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#release-notes).
   
   -->
   Fix a bug where users are unable to call `GET 
/druid/indexer/v1/compaction/status/datasources` API after creating enabling 
compaction supervisors.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `OverlordCompactionScheduler`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist below are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] a release note entry in the PR description.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to