[
https://issues.apache.org/jira/browse/KUDU-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060149#comment-16060149
]
Adar Dembo commented on KUDU-2052:
----------------------------------
bq. What should we recommend to folks who are upgrading to 1.4 who are on xfs
and el6?
Good question. Setting log_container_excess_space_before_cleanup_fraction
obscenely high will effectively disable the heuristic. The default value is
0.10, which means we'll only repunch containers whose size according to the
filesystem is at least 1.1x the size we think it should be (based on live
blocks). If you set the flag's value to something super high (like 100.0), the
size disparity has to reach 101x before repunching kicks in.
Note: the downside of doing this is full containers may occupy space on disk
that could otherwise be reclaimed by the filesystem.
> Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs filesystems
> --------------------------------------------------------------
>
> Key: KUDU-2052
> URL: https://issues.apache.org/jira/browse/KUDU-2052
> Project: Kudu
> Issue Type: Bug
> Components: util
> Affects Versions: 1.4.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Critical
>
> One of the changes in Kudu 1.4 is a more comprehensive repair functionality
> in log block manager startup. Amongst other things this includes a heuristic
> to detect whether an LBM container consumes more disk space than it should,
> based on the live blocks in the container. If the heuristic fires, the LBM
> reclaims the extra disk space by truncating the end of the container and
> repunching out all of the dead blocks in the container.
> We brought up Kudu 1.4 on a large production cluster running xfs and observed
> pathologically slow startup times. On one node, there was a three hour gap
> between the last bit of data directory processing and the end of LBM startup
> in general. This time can only be attributed to hole repunching, which is
> executed by the same set of thread pools that open the data directories.
> Further research revealed that on xfs in el6, a hole punch via fallocate()
> _always_ includes an fsync() (in the kernel), even if the underlying data was
> already punched out. This isn't the case with ext4, nor does it appear to be
> the case with xfs in more modern kernels (though this hasn't been confirmed).
> xfs provides the [XFS_IOC_UNRESVSP64
> ioctl|https://linux.die.net/man/3/xfsctl], which can be used to deallocate
> space from a file. That sounds an awful lot like hole punching, and some
> quick performance tests show that it doesn't incur the cost of an fsync(). We
> should switch over to it when punching holes on xfs. Certainly on older (i.e.
> el6) kernels, and potentially everywhere for simplicity's sake.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)