[
https://issues.apache.org/jira/browse/KUDU-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442885#comment-15442885
]
Todd Lipcon commented on KUDU-1582:
-----------------------------------
Did a little analysis, and it seems like all the time's in the knapsack solver
algorithm.
I also grabbed the rowset layout from one of these big tablets, and did a bit
of analysis. It looks like we can optimize this significantly (8-10x at least)
by computing a lower bound solution (which may not use the entirety of the
'knapsack budget') and comparing that to a computed upper-bound. If the
lower-bound solution (which is very fast to compute) is within some percentage
of the upper-bound solution, we can skip doing the more expensive knapsack
solution.
> maintenance manager scheduling very slow on TS with lots of data
> ----------------------------------------------------------------
>
> Key: KUDU-1582
> URL: https://issues.apache.org/jira/browse/KUDU-1582
> Project: Kudu
> Issue Type: Bug
> Components: perf, tserver
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Attachments: trace.json.gz
>
>
> On a server with ~5.5TB of data, the maintenance manager scheduler thread has
> gotten quite slow. The thread takes many tens of seconds to pick a
> maintenance operation, and then the actual operations take only a few seconds
> to run. So, the actual "duty cycle" of those threads is quite low, and
> compaction/flushing falls behind.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)