[ 
https://issues.apache.org/jira/browse/KUDU-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442885#comment-15442885
 ] 

Todd Lipcon commented on KUDU-1582:
-----------------------------------

Did a little analysis, and it seems like all the time's in the knapsack solver 
algorithm.

I also grabbed the rowset layout from one of these big tablets, and did a bit 
of analysis. It looks like we can optimize this significantly (8-10x at least) 
by computing a lower bound solution (which may not use the entirety of the 
'knapsack budget') and comparing that to a computed upper-bound. If the 
lower-bound solution (which is very fast to compute) is within some percentage 
of the upper-bound solution, we can skip doing the more expensive knapsack 
solution.

> maintenance manager scheduling very slow on TS with lots of data
> ----------------------------------------------------------------
>
>                 Key: KUDU-1582
>                 URL: https://issues.apache.org/jira/browse/KUDU-1582
>             Project: Kudu
>          Issue Type: Bug
>          Components: perf, tserver
>    Affects Versions: 0.10.0
>            Reporter: Todd Lipcon
>         Attachments: trace.json.gz
>
>
> On a server with ~5.5TB of data, the maintenance manager scheduler thread has 
> gotten quite slow. The thread takes many tens of seconds to pick a 
> maintenance operation, and then the actual operations take only a few seconds 
> to run. So, the actual "duty cycle" of those threads is quite low, and 
> compaction/flushing falls behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to