maytasm commented on a change in pull request #11245:
URL: https://github.com/apache/druid/pull/11245#discussion_r631450186



##########
File path: docs/operations/clean-metadata-store.md
##########
@@ -0,0 +1,104 @@
+---
+id: clean-metadata-store
+title: "Automated cleanup for metadata records related to deleted datasources"
+sidebar_label: Automated metadata store cleanup
+description: "Defines a strategy to maintain Druid metadata store performance 
by automatically removing leftover records for deleted datasources. Most 
applicable to databases with 'high-churn' datasources."
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+When you delete a datasource from Apache Druid, some records related to the 
datasource may remain in the metadata store including:
+
+- audit records
+- supervisor records
+- rule records
+- compaction configuration records
+- datasource records created by supervisors
+
+If you have a high datasource churn rate, meaning you frequently create and 
delete many short-lived datasources, the leftover records can start to fill 
your metadata store and cause performance issues. To maintain metadata store 
performance in this case, you can configure Apache Druid to automatically 
remove records associated with deleted datasources from the metadata store.
+
+## Automated cleanup strategies
+There are several cases when you should consider automated cleanup of the 
metadata related to deleted datasources:
+- Proactively, if you know you have many high-churn datasources. For example 
you have scripts that create and delete supervisors regularly.
+- If you have issues with the hard disk for your metadata database filling up.
+- If you run into performance issues with the metadata database. For example 
API calls are very slow or fail to execute.
+
+Do not use the metadata store automated cleanup features if you have 
requirements to retain metadata records. For example, you have compliance 
requirements to keep audit records. In these cases, you should come up with an 
alternate method to preserve the audit metadata while freeing up your active 
metadata store.
+
+## Configure automated metadata cleanup
+Automated cleanup only removes records for deleted datasources. You can 
configure cleanup on a per-table basis as follows:
+ - `druid.coordinator.kill.*.on` enables cleanup for a particular metadata 
table.

Review comment:
       We should list out what are the available parameters for `*` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to