jihoonson commented on a change in pull request #11164:
URL: https://github.com/apache/druid/pull/11164#discussion_r622661391



##########
File path: 
server/src/main/java/org/apache/druid/metadata/MetadataRuleManager.java
##########
@@ -42,4 +42,12 @@
   List<Rule> getRulesWithDefault(String dataSource);
 
   boolean overrideRule(String dataSource, List<Rule> rulesConfig, AuditInfo 
auditInfo);
+
+  /**
+   * Remove rules for non-existence datasource (datasource with no segment) 
created older than the given timestamp.
+   *
+   * @param timestamp timestamp in milliseconds
+   * @return number of rules removed
+   */
+  int removeRulesOlderThan(long timestamp);

Review comment:
       Maybe `removeRulesForEmptyDatasourcesOlderThan` is better

##########
File path: 
server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java
##########
@@ -421,8 +422,35 @@ public Void inTransaction(Handle handle, TransactionStatus 
transactionStatus) th
     return true;
   }
 
+  @Override
+  public int removeRulesOlderThan(long timestamp)
+  {
+    DateTime dateTime = DateTimes.utc(timestamp);
+    synchronized (lock) {
+      return dbi.withHandle(
+          handle -> {
+            Update sql = handle.createStatement(
+                StringUtils.format(
+                    "DELETE FROM %1$s WHERE datasource NOT IN (SELECT DISTINCT 
datasource from %2$s) and datasource!=:default_rule and version < :date_time",

Review comment:
       Also, this query could be expensive when the segments table is large. I 
think maybe it's better to implement the join in this class. For example, this 
method can iterate all datasources in the rules table and check whether there 
is any entry in the segments table for each datasource. Then, you can use a 
much cheaper query for the existence check such as `SELECT datasource FROM 
segments where datasource = ${rule_datasource} LIMIT 1`. Maybe this approach 
needs to add a new index on `datasource` for the segments table.

##########
File path: 
server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java
##########
@@ -421,8 +422,35 @@ public Void inTransaction(Handle handle, TransactionStatus 
transactionStatus) th
     return true;
   }
 
+  @Override
+  public int removeRulesOlderThan(long timestamp)
+  {
+    DateTime dateTime = DateTimes.utc(timestamp);
+    synchronized (lock) {
+      return dbi.withHandle(
+          handle -> {
+            Update sql = handle.createStatement(
+                StringUtils.format(
+                    "DELETE FROM %1$s WHERE datasource NOT IN (SELECT DISTINCT 
datasource from %2$s) and datasource!=:default_rule and version < :date_time",

Review comment:
       After this change, the `version` must be the timestamp of when the rule 
entry was added. It means [this 
code](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java#L381)
 must remain the same as it is unless we change this query together. Please add 
comments on both codes so that we won't forget it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to