[ 
https://issues.apache.org/jira/browse/AURORA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851688#comment-15851688
 ] 

Mehrdad Nurolahzade commented on AURORA-1866:
---------------------------------------------

With 0.17.0 deployment of the scheduler in our production clusters, I am seeing 
5-60 sec spent in {{RowGarbageCollector.runOneIteration()}} method to prune 
10-200 records.

> Reduce Storage Write Lock Contention in RowGarbageCollector
> -----------------------------------------------------------
>
>                 Key: AURORA-1866
>                 URL: https://issues.apache.org/jira/browse/AURORA-1866
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Mehrdad Nurolahzade
>            Priority: Minor
>
> {{RowGarbageCollector}} runs as a background service and deletes unreferenced 
> rows in {{task_configs}} and {{job_keys}} tables (by default every two 
> hours). This is achieved by deleting all existing rows (one by one) and 
> silently ignoring the rows that fail deletion due to referential integrity 
> constraints. The entire operation happens while holding the storage write 
> lock. 
> We are not currently exposing stats on timing of this operation. There is a 
> ticket to expose such stats ([AURORA-1842]). If proven to be an expensive 
> operation, we need to consider batching the operations or setting a cap on 
> the maximum number of rows garbage collected (and scheduling more frequently) 
> to reduce contention on the storage write lock.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to