[ 
https://issues.apache.org/jira/browse/KUDU-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060114#comment-16060114
 ] 

Adar Dembo commented on KUDU-2052:
----------------------------------

The following experiments were run using a single spinning HDD on an CentOS 6.6 
machine. First, a filesystem was created in a 10G file and mounted with -o 
loop. Inside the filesystem a 1G test file was created using dd from 
/dev/urandom. The first 100M of the test file were punched out via fallocate, 
sync was run, and then the following repunching tests in a loop using perf stat.

ext4 with fallocate-based hole punching:
{noformat}
 Performance counter stats for 'fallocate -p -o 0 -l 100M foo' (1000 runs):

          0.269927 task-clock                #    0.635 CPUs utilized           
 ( +-  0.33% )
                 0 context-switches          #    0.000 K/sec                  
                 0 cpu-migrations            #    0.041 K/sec                   
 ( +- 30.00% )
               150 page-faults               #    0.555 M/sec                   
 ( +-  0.01% )
           770,390 cycles                    #    2.854 GHz                     
 ( +-  0.33% )
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
           477,294 instructions              #    0.62  insns per cycle         
 ( +-  0.04% )
            98,535 branches                  #  365.044 M/sec                   
 ( +-  0.03% )
             3,979 branch-misses             #    4.04% of all branches         
 ( +-  0.61% )

       0.000425207 seconds time elapsed                                         
 ( +-  0.45% )
{noformat}

xfs filesystem with fallocate-based hole punching:
{noformat}
  Performance counter stats for 'fallocate -p -o 0 -l 100M foo' (1000 runs):

          0.403296 task-clock                #    0.013 CPUs utilized           
 ( +-  0.32% )
                 2 context-switches          #    0.005 M/sec                   
 ( +-  0.17% )
                 0 cpu-migrations            #    0.017 K/sec                   
 ( +- 37.68% )
               150 page-faults               #    0.371 M/sec                   
 ( +-  0.01% )
         1,112,706 cycles                    #    2.759 GHz                     
 ( +-  0.17% )
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
           505,027 instructions              #    0.45  insns per cycle         
 ( +-  0.03% )
           103,652 branches                  #  257.013 M/sec                   
 ( +-  0.02% )
             5,750 branch-misses             #    5.55% of all branches         
 ( +-  0.04% )

       0.031220273 seconds time elapsed                                         
 ( +-  0.57% )
{noformat}

xfs filesystem with XFS_IOC_UNRESVSP64-based hole punching:
{noformat}
 Performance counter stats for 'xfs_io -c unresvsp 0 104857600 foo' (1000 runs):

          0.477930 task-clock                #    0.677 CPUs utilized           
 ( +-  0.28% )
                 0 context-switches          #    0.004 K/sec                   
 ( +- 70.68% )
                 0 cpu-migrations            #    0.017 K/sec                   
 ( +- 39.42% )
               215 page-faults               #    0.449 M/sec                   
 ( +-  0.01% )
         1,463,629 cycles                    #    3.062 GHz                     
 ( +-  0.15% )
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
         1,150,346 instructions              #    0.79  insns per cycle         
 ( +-  0.01% )
           241,338 branches                  #  504.964 M/sec                   
 ( +-  0.01% )
             9,753 branch-misses             #    4.04% of all branches         
 ( +-  0.02% )

       0.000706070 seconds time elapsed                                         
 ( +-  0.36% )
{noformat}

> Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs filesystems
> --------------------------------------------------------------
>
>                 Key: KUDU-2052
>                 URL: https://issues.apache.org/jira/browse/KUDU-2052
>             Project: Kudu
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 1.4.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Critical
>
> One of the changes in Kudu 1.4 is a more comprehensive repair functionality 
> in log block manager startup. Amongst other things this includes a heuristic 
> to detect whether an LBM container consumes more disk space than it should, 
> based on the live blocks in the container. If the heuristic fires, the LBM 
> reclaims the extra disk space by truncating the end of the container and 
> repunching out all of the dead blocks in the container.
> We brought up Kudu 1.4 on a large production cluster running xfs and observed 
> pathologically slow startup times. On one node, there was a three hour gap 
> between the last bit of data directory processing and the end of LBM startup 
> in general. This time can only be attributed to hole repunching, which is 
> executed by the same set of thread pools that open the data directories.
> Further research revealed that on xfs in el6, a hole punch via fallocate() 
> _always_ includes an fsync() (in the kernel), even if the underlying data was 
> already punched out. This isn't the case with ext4, nor does it appear to be 
> the case with xfs in more modern kernels (though this hasn't been confirmed).
> xfs provides the [XFS_IOC_UNRESVSP64 
> ioctl|https://linux.die.net/man/3/xfsctl], which can be used to deallocate 
> space from a file. That sounds an awful lot like hole punching, and some 
> quick performance tests show that it doesn't incur the cost of an fsync(). We 
> should switch over to it when punching holes on xfs. Certainly on older (i.e. 
> el6) kernels, and potentially everywhere for simplicity's sake.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to