clintropolis opened a new pull request, #12727:
URL: https://github.com/apache/druid/pull/12727

   ### Description
   This PR fixes an issue where datasources with no segments can become 
effectively stuck `DruidSchema` in a stale state if they are in the 'needs 
refresh' list after all segments have been removed. This meant that the 
datasource would be more or less permanently in `INFORMATION_SCHEMA.TABLES` 
until new segment activity takes place for such a datasource, which might be 
never if it was completely dropped, meaning the table would be present until 
the broker was restarted.
   
   This is the source of the `ITBroadcastJoinQueryTest` flakiness, since 
depending on if the table was in the refresh list or not seemed to be the 
determining factor on whether or not the test flaked out or not. 
   
   In the failure case, the broker would have a log message like this:
   ```
   2022-06-30T23:40:59,164 INFO [DruidSchema-Cache-0] 
org.apache.druid.sql.calcite.schema.DruidSchema - global dataSource 
[broadcast_join_wikipedia_test] has new signature: {}
   ```
   
   which was the clue I was looking for that trouble was afoot. There is 
probably an alternative way to fix this by removing stuff from the refresh 
list, but this way seemed to work well enough and ensures that we shouldn't 
ever run into stale tables by giving us another out to remove them.
   
   Since making this change I have been unable to trigger the failure locally, 
so will run through travis a few times to ensure it is indeed fixed. None of 
the changes made to `ITBroadcastJoinQueryTest` were in fact necessary, I just 
modified it to more aggressively drop the segments so that the test would run a 
little quicker rather than waiting on the load rules to drop them.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [x] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to