Will Berkeley created KUDU-2704:
-----------------------------------
Summary: Rowsets that are much bigger than the target size
discourage compactions
Key: KUDU-2704
URL: https://issues.apache.org/jira/browse/KUDU-2704
Project: Kudu
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley
Assignee: Will Berkeley
In KUDU-2701, I fixed a KUDU-1400-related compaction loop where the size used
for compaction was the base data and redos, which caused situations where
compacting rowsets that looked small but weren't was effectively a no-op,
resulting in a compaction loop. Now, rowset count / KUDU-1400 compactions use
the whole rowset size. While testing something on a table with 279 columns, I
noticed that almost all rowsets were being flushed at a size of 80-90MB and,
even though the tablet height was increasing rapidly and above 20, almost no
compactions were happening. Looking into it, when the total size of the rowset
is far above the target size, we assign a big negative score to including the
rowset in a compaction, since the score is proportional to 1 - size/target
size. This problem always existed, it just got worse because the size now
includes more things.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)