Andrew Wong created KUDU-3153:
---------------------------------

             Summary: Use full DRS size when considering rowsets to compact
                 Key: KUDU-3153
                 URL: https://issues.apache.org/jira/browse/KUDU-3153
             Project: Kudu
          Issue Type: Bug
          Components: compaction, tserver
            Reporter: Andrew Wong
         Attachments: Screen Shot 2020-06-19 at 5.06.19 PM.png

We sometimes encounter interesting behavior when viewing the rowset layout 
diagram, like the quartiles indicating well-compacted (32MB-sized) rowsets, 
while the compaction policy dump shows all rowsets very much undersized (around 
10MB).

Looking through what's used where, a snippet from the patch for KUDU-2701 
indicates, the policy considers only base data and redo files sizes, excluding 
the PK index and bloom filters:

{quote}
It's not totally clear to me why just base data and REDOs are used, ...
{quote}

After some spelunking, it seems like the usage of base data + redo file size 
stems from a time when DiskRowSet didn't have an interface to get the full size 
of the DRS, as seen in an [older version of 
RowSetInfo|https://github.com/apache/kudu/blame/6a12ba3f7d66dcf748e8864aae8139813c1c4746/src/kudu/tablet/rowset_info.cc#L256]
 and the [corresponding version of 
diskrowset.cc|https://github.com/apache/kudu/blob/6a12ba3f7d66dcf748e8864aae8139813c1c4746/src/kudu/tablet/diskrowset.h#L335].

We should probably consider using the full size of the DRSs -- I suspect that 
would give us more fruitful estimates to the efficacy of a compaction, 
especially in the context of a "small rowset" compaction (see KUDU-1400).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to