Adar Dembo created KUDU-2052:
--------------------------------
Summary: Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs
filesystems
Key: KUDU-2052
URL: https://issues.apache.org/jira/browse/KUDU-2052
Project: Kudu
Issue Type: Bug
Components: util
Affects Versions: 1.4.0
Reporter: Adar Dembo
Assignee: Adar Dembo
Priority: Critical
One of the changes in Kudu 1.4 is a more comprehensive repair functionality in
log block manager startup. Amongst other things this includes a heuristic to
detect whether an LBM container consumes more disk space than it should, based
on the live blocks in the container. If the heuristic fires, the LBM reclaims
the extra disk space by truncating the end of the container and repunching out
all of the dead blocks in the container.
We brought up Kudu 1.4 on a large production cluster running xfs and observed
pathologically slow startup times. On one node, there was a three hour gap
between the last bit of data directory processing and the end of LBM startup in
general. This time can only be attributed to hole repunching, which is executed
by the same set of thread pools that open the data directories.
Further research revealed that on xfs in el6, a hole punch via fallocate()
_always_ includes an fsync() (in the kernel), even if the underlying data was
already punched out. This isn't the case with ext4, nor does it appear to be
the case with xfs in more modern kernels (though this hasn't been confirmed).
xfs provides the [XFS_IOC_UNRESVSP64 ioctl|https://linux.die.net/man/3/xfsctl],
which can be used to deallocate space from a file. That sounds an awful lot
like hole punching, and some quick performance tests show that it doesn't incur
the cost of an fsync(). We should switch over to it when punching holes on xfs.
Certainly on older (i.e. el6) kernels, and potentially everywhere for
simplicity's sake.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)