[ 
https://issues.apache.org/jira/browse/KUDU-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2001.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

Fixed in 1aa0ebb91fc072b4cd3ed629721b8263a6ba1ffe

> Metric on_disk_size does not include UNDO deltas
> ------------------------------------------------
>
>                 Key: KUDU-2001
>                 URL: https://issues.apache.org/jira/browse/KUDU-2001
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 1.3.1
>            Reporter: Mike Percy
>            Assignee: Will Berkeley
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Kudu has a (misleadingly named) metric called {{on_disk_size}} defined in 
> tablet.cc with the metric description "Tablet size on disk".
> The current implementation (as of 1.3.1) is that this metric only counts 
> bytes contained in the base data and the REDO deltas in the DiskRowSets in 
> addition to the data in the MemRowSet. It does not include UNDO deltas. Also 
> not included is data in the WALs and other metadata files.
> The easy thing to do to improve this situation is change the description of 
> the metric to be "Space used by this tablet's data blocks" and add UNDO 
> deltas to the count. However that would be a 2-step process.
> The metric is currently tied to Tablet::EstimateOnDiskSize(). If you trace 
> that down to the DiskRowSet you will end up at a function in DiskRowSet:
> {code}
> uint64_t DiskRowSet::EstimateOnDiskSize() const {
>   DCHECK(open_);
>   shared_lock<rw_spinlock> l(component_lock_);
>   return base_data_->EstimateOnDiskSize() + 
> delta_tracker_->EstimateOnDiskSize();
> }
> {code}
> In the DeltaTracker, you can see that we are only counting REDO deltas, not 
> UNDO deltas:
> {code}
> uint64_t DeltaTracker::EstimateOnDiskSize() const {
>   shared_lock<rw_spinlock> lock(component_lock_);
>   uint64_t size = 0;
>   for (const shared_ptr<DeltaStore>& ds : redo_delta_stores_) {
>     size += ds->EstimateSize();
>   }
>   return size;
> }
> {code}
> However, this function is used by the MM op 
> MajorDeltaCompactionOp::UpdateStats() which eventually calls into double 
> DiskRowSet::DeltaStoresCompactionPerfImprovementScore(). That function calls 
> into EstimateDeltaDiskSize() which has the following implementation:
> {code}
> uint64_t DiskRowSet::EstimateDeltaDiskSize() const {
>   DCHECK(open_);
>   shared_lock<rw_spinlock> l(component_lock_);
>   return delta_tracker_->EstimateOnDiskSize();
> }
> {code}
> So in order not to break that estimation we will need to separate the two, 
> such that we provide a way to estimate the Redo delta size separately from 
> the size of all of the deltas in a RowSet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to