[ 
https://issues.apache.org/jira/browse/KUDU-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642094#comment-16642094
 ] 

Will Berkeley commented on KUDU-2056:
-------------------------------------

A few ideas:

1. The one suggested by Todd: integrate the height function across key space 
w.r.t the keyspace probability measure. Since the height function is a step 
function, we can integrate it exactly while only considering endpoints of 
rowsets. The metric only needs to be updated when a rowset is created, dropped, 
or its bounds change, so only after flushes, merge compactions, and major delta 
compactions.
2. Once KUDU-1979 and KUDU-1625 are implemented or being worked on, some kind 
of measure of how much space could be freed by GC'ing deleted rows into undos 
that will eventually be GC'd themselves. Maybe a "ghost probability" where we 
estimate the probability that a random key in the keyspace is deleted row. This 
would need to be considered alongside a cost measure if row GC will become 
another factor in compaction policy.
3. For extreme cases (which sadly seem more common than I'd hoped), some kind 
of measure of when seeks start to dominate over sequential IO in scans. 
Basically, this is the count of rowsets, but we need to appropriately scale it 
to the size of the tablet.

> Expose a metric for how much a tablet needs to be compacted
> -----------------------------------------------------------
>
>                 Key: KUDU-2056
>                 URL: https://issues.apache.org/jira/browse/KUDU-2056
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tablet
>            Reporter: Jean-Daniel Cryans
>            Assignee: Will Berkeley
>            Priority: Major
>
> Now that the maintenance manager is fast at scheduling tasks, I've seen 
> clusters running 1.4 that are churning through compactions at a high rate 
> with seemingly no end in sight. At least it *feels* like it, but there's no 
> easy way to verify.
> I think it would be good to have some measure of how "uncompacted" a tablet 
> is. Todd thinks we could just use the average "height" of what's seen on the 
> "Rowset Layout Diagram" page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to