paul-rogers commented on code in PR #13131:
URL: https://github.com/apache/druid/pull/13131#discussion_r992900167


##########
docs/operations/clean-metadata-store.md:
##########
@@ -143,7 +143,7 @@ Datasource cleanup uses the following configuration:
 
 ### Indexer task logs
 
-You can configure the Overlord to delete indexer task log metadata and the 
indexer task logs from local disk or from cloud storage.
+You can configure the Overlord to delete indexer task log metadata and the 
indexer task logs from local disk or from cloud storage.  The cleanup includes 
the `druid_tasks` and `druid_tasklogs` tables in the metadata database, and the 
task logs in deep storage.  (Note that `druid_tasklogs` is no longer used and 
will already be empty, unless the druid version is older.) 

Review Comment:
   Is this cleanup absolute? That is, it deletes everything? Or, only items 
older than some expiration age? (If everything, then there is a feature request 
to have that cutoff.)
   
   When configuring, does the cleanup occur on a schedule? After every task?
   
   I'd be very surprised if the cleanup actually "removes the `druid_tasks` and 
`druid_tasklogs` tables". This seems extreme, and introduces race conditions. 
Does it actually "drop all records in the `druid_tasks` and `druid_tasklogs` 
tables"?
   
   If we do the brute-force, drop all info approach, then suggestion:
   
   > You can configure the Overlord to delete information for all indexer tasks 
which have either completed or failed. During cleanup, the Overlord drops all 
records from the `druid_tasks` and `druid_tasklogs` tables in the metadata 
database. Overlord also removes all task logs from deep storage.
   
   What it would be great if we could say:
   
   > You can configure the Overlord to periodically expire (delete) indexer 
task information. Overlord will delete tasks that have either completed or 
failed if those tasks are older than the expiration period. During cleanup, the 
Overlord drops expired records from the `druid_tasks` and `druid_tasklogs` 
tables in the metadata database. Overlord also  removes expired task logs from 
deep storage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to