Will Berkeley created KUDU-2704:
-----------------------------------

             Summary: Rowsets that are much bigger than the target size 
discourage compactions
                 Key: KUDU-2704
                 URL: https://issues.apache.org/jira/browse/KUDU-2704
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Will Berkeley
            Assignee: Will Berkeley


In KUDU-2701, I fixed a KUDU-1400-related compaction loop where the size used 
for compaction was the base data and redos, which caused situations where 
compacting rowsets that looked small but weren't was effectively a no-op, 
resulting in a compaction loop. Now, rowset count / KUDU-1400 compactions use 
the whole rowset size. While testing something on a table with 279 columns, I 
noticed that almost all rowsets were being flushed at a size of 80-90MB and, 
even though the tablet height was increasing rapidly and above 20, almost no 
compactions were happening. Looking into it, when the total size of the rowset 
is far above the target size, we assign a big negative score to including the 
rowset in a compaction, since the score is proportional to 1 - size/target 
size. This problem always existed, it just got worse because the size now 
includes more things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to