[
https://issues.apache.org/jira/browse/AURORA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851688#comment-15851688
]
Mehrdad Nurolahzade commented on AURORA-1866:
---------------------------------------------
With 0.17.0 deployment of the scheduler in our production clusters, I am seeing
5-60 sec spent in {{RowGarbageCollector.runOneIteration()}} method to prune
10-200 records.
> Reduce Storage Write Lock Contention in RowGarbageCollector
> -----------------------------------------------------------
>
> Key: AURORA-1866
> URL: https://issues.apache.org/jira/browse/AURORA-1866
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: Mehrdad Nurolahzade
> Priority: Minor
>
> {{RowGarbageCollector}} runs as a background service and deletes unreferenced
> rows in {{task_configs}} and {{job_keys}} tables (by default every two
> hours). This is achieved by deleting all existing rows (one by one) and
> silently ignoring the rows that fail deletion due to referential integrity
> constraints. The entire operation happens while holding the storage write
> lock.
> We are not currently exposing stats on timing of this operation. There is a
> ticket to expose such stats ([AURORA-1842]). If proven to be an expensive
> operation, we need to consider batching the operations or setting a cap on
> the maximum number of rows garbage collected (and scheduling more frequently)
> to reduce contention on the storage write lock.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)