[
https://issues.apache.org/jira/browse/KUDU-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060114#comment-16060114
]
Adar Dembo commented on KUDU-2052:
----------------------------------
The following experiments were run using a single spinning HDD on an CentOS 6.6
machine. First, a filesystem was created in a 10G file and mounted with -o
loop. Inside the filesystem a 1G test file was created using dd from
/dev/urandom. The first 100M of the test file were punched out via fallocate,
sync was run, and then the following repunching tests in a loop using perf stat.
ext4 with fallocate-based hole punching:
{noformat}
Performance counter stats for 'fallocate -p -o 0 -l 100M foo' (1000 runs):
0.269927 task-clock # 0.635 CPUs utilized
( +- 0.33% )
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.041 K/sec
( +- 30.00% )
150 page-faults # 0.555 M/sec
( +- 0.01% )
770,390 cycles # 2.854 GHz
( +- 0.33% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
477,294 instructions # 0.62 insns per cycle
( +- 0.04% )
98,535 branches # 365.044 M/sec
( +- 0.03% )
3,979 branch-misses # 4.04% of all branches
( +- 0.61% )
0.000425207 seconds time elapsed
( +- 0.45% )
{noformat}
xfs filesystem with fallocate-based hole punching:
{noformat}
Performance counter stats for 'fallocate -p -o 0 -l 100M foo' (1000 runs):
0.403296 task-clock # 0.013 CPUs utilized
( +- 0.32% )
2 context-switches # 0.005 M/sec
( +- 0.17% )
0 cpu-migrations # 0.017 K/sec
( +- 37.68% )
150 page-faults # 0.371 M/sec
( +- 0.01% )
1,112,706 cycles # 2.759 GHz
( +- 0.17% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
505,027 instructions # 0.45 insns per cycle
( +- 0.03% )
103,652 branches # 257.013 M/sec
( +- 0.02% )
5,750 branch-misses # 5.55% of all branches
( +- 0.04% )
0.031220273 seconds time elapsed
( +- 0.57% )
{noformat}
xfs filesystem with XFS_IOC_UNRESVSP64-based hole punching:
{noformat}
Performance counter stats for 'xfs_io -c unresvsp 0 104857600 foo' (1000 runs):
0.477930 task-clock # 0.677 CPUs utilized
( +- 0.28% )
0 context-switches # 0.004 K/sec
( +- 70.68% )
0 cpu-migrations # 0.017 K/sec
( +- 39.42% )
215 page-faults # 0.449 M/sec
( +- 0.01% )
1,463,629 cycles # 3.062 GHz
( +- 0.15% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,150,346 instructions # 0.79 insns per cycle
( +- 0.01% )
241,338 branches # 504.964 M/sec
( +- 0.01% )
9,753 branch-misses # 4.04% of all branches
( +- 0.02% )
0.000706070 seconds time elapsed
( +- 0.36% )
{noformat}
> Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs filesystems
> --------------------------------------------------------------
>
> Key: KUDU-2052
> URL: https://issues.apache.org/jira/browse/KUDU-2052
> Project: Kudu
> Issue Type: Bug
> Components: util
> Affects Versions: 1.4.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Critical
>
> One of the changes in Kudu 1.4 is a more comprehensive repair functionality
> in log block manager startup. Amongst other things this includes a heuristic
> to detect whether an LBM container consumes more disk space than it should,
> based on the live blocks in the container. If the heuristic fires, the LBM
> reclaims the extra disk space by truncating the end of the container and
> repunching out all of the dead blocks in the container.
> We brought up Kudu 1.4 on a large production cluster running xfs and observed
> pathologically slow startup times. On one node, there was a three hour gap
> between the last bit of data directory processing and the end of LBM startup
> in general. This time can only be attributed to hole repunching, which is
> executed by the same set of thread pools that open the data directories.
> Further research revealed that on xfs in el6, a hole punch via fallocate()
> _always_ includes an fsync() (in the kernel), even if the underlying data was
> already punched out. This isn't the case with ext4, nor does it appear to be
> the case with xfs in more modern kernels (though this hasn't been confirmed).
> xfs provides the [XFS_IOC_UNRESVSP64
> ioctl|https://linux.die.net/man/3/xfsctl], which can be used to deallocate
> space from a file. That sounds an awful lot like hole punching, and some
> quick performance tests show that it doesn't incur the cost of an fsync(). We
> should switch over to it when punching holes on xfs. Certainly on older (i.e.
> el6) kernels, and potentially everywhere for simplicity's sake.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)