Adar Dembo created KUDU-1943:
--------------------------------
Summary: log containers should be reusables without first closing
in-flight writable blocks
Key: KUDU-1943
URL: https://issues.apache.org/jira/browse/KUDU-1943
Project: Kudu
Issue Type: Bug
Components: fs
Affects Versions: 1.3.0
Reporter: Adar Dembo
The log block manager has had a longstanding issue wherein a container can only
be used by a block once the outstanding writable block has been closed. Thing
is, we like to delay the close (and sync) of blocks until the very end of a
Kudu flush/compact operation, so as to maximize the amount of time that the
kernel has to asynchronously flush dirty pages out to disk. As a result, the
LBM can easily generate a thousand containers after flushing a very modest
tablet of ~30 columns. To be precise, the number of containers will be equal to
the flush threshold (1 GB by default) divided by the rowset size (32 MB by
default) multiplied by the number of columns in the tablet. Coupled with the
LBM's default preallocation buffer size (32 MB), a single tablet flush can
result in the tserver's space consumption skyrocketing to 32 GB.
In and of itself this isn't fatal; the tserver will make use of this space over
time. But it's a pretty bad first impression for a novice who is trying to
calculate just how much disk space Kudu uses, and it means Kudu's disk space
consumption is very "bursty" instead of linear.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)