Adar Dembo created KUDU-1755:
--------------------------------

             Summary: Improve tablet disk space estimation
                 Key: KUDU-1755
                 URL: https://issues.apache.org/jira/browse/KUDU-1755
             Project: Kudu
          Issue Type: Bug
          Components: supportability, tablet
    Affects Versions: 1.1.0
            Reporter: Adar Dembo


(Prompted by [this user 
post|http://mail-archives.apache.org/mod_mbox/kudu-user/201611.mbox/%3Ctencent_201BBF963FB5CB2D7AF99E25%40qq.com%3E])

The on-disk size of tablets as reported by the Kudu web UI omits some minor as 
well as some major sources of space consumption. I'm listing them all here for 
posterity.
# Bloom file and composite index file usage. According to [this 
gerrit|https://gerrit.sjc.cloudera.com/#/c/6070/] (warning: internal link), 
it's because we also use the rowset estimate to determine how much IO will be 
generated were we to compact that rowset, and bloom/composite index files 
aren't touched in compaction.
# UNDO file usage. This seems like a more glaring omission, especially for 
mutation-heavy workloads like the one reported in the mailing list. But, the 
current REDO-only estimate factors into major delta compaction decision making 
by the maintenance manager, so maybe there's a good reason there too.
# Log block manager block size rounding. The LBM rounds up Kudu blocks to the 
nearest filesystem block size to improve hole punching space reclamation. A 
side effect is that some space is lost to external fragmentation.
# Log block manager metadata overhead. Every container has a .metadata file, 
and we don't factor that into space utilization.
# Other files, such as the tablet superblock, WAL segments, and cmeta.

I expect the first two items to be the largest, so we should work on addressing 
them. Lets decouple the UI-based estimate from the MM path so our reporting can 
be more accurate while still allowing the MM to make good decisions.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to