[
https://issues.apache.org/jira/browse/KUDU-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-2047:
------------------------------
Component/s: compaction
> Lazy cfile open and maintenance op stat caching cause fruitful delta
> compaction ops to never run
> ------------------------------------------------------------------------------------------------
>
> Key: KUDU-2047
> URL: https://issues.apache.org/jira/browse/KUDU-2047
> Project: Kudu
> Issue Type: Bug
> Components: compaction, perf, tablet
> Affects Versions: 1.4.0
> Reporter: Todd Lipcon
> Assignee: William Berkeley
> Priority: Major
>
> I was just looking at a cluster which has a large amount of REDO data on some
> of its tablets, and wasn't sure why it wasn't ever compacting it. The issue
> appears to be the following:
> - in DiskRowSet::DeltaStoresCompactionPerfImprovementScore(), we call through
> to GetColumnIdsWithUpdates() to see which columns may need compaction
> -- if the REDO delta block is not open (eg when the server has recently
> started), this will skip the unopened delta file stats and not include them
> in the result
> -- we thus determine that the compaction is not fruitful
> This was a conscious decision to avoid the MM from eagerly opening every
> delta on its first pass through computing compaction stats. We figured that,
> if it were worth compacting, then probably someone would scan the data,
> forcing the deltas to get opened and thus made eligible for compaction.
> However, the MM tries to be smart about caching the statistics (see
> e7fe0c1a94cac364522c09b8208c98480947d794). In particular, if it sees that the
> tablet has not run any flushes or compactions, it won't bother to recalculate
> the stats, assuming they haven't changed.
> So, if you have a completely read-only tablet with some uncompacted deltas,
> the MM op will never run.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)