[ 
https://issues.apache.org/jira/browse/HDFS-16403?focusedWorklogId=708894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708894
 ]

ASF GitHub Bot logged work on HDFS-16403:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jan/22 07:09
            Start Date: 14/Jan/22 07:09
    Worklog Time Spent: 10m 
      Work Description: cndaimin commented on pull request #3842:
URL: https://github.com/apache/hadoop/pull/3842#issuecomment-1012833910


   @jojochuang @fapifta Thanks for the review.
   
   We use `fio` with 60 threads to do random read on files under test directory 
and to measure the performance by read IOPS, test steps as following:
   
   -  Prepare test files to speed up the next random reads.
   -  Drop page cache of both client and datanode servers.
   -  Do the random read test with 60 threads.
   
   The test scripts:
   ```
   # Prepare test files to speed up the next random reads.
   fio -iodepth=32 -rw=write -ioengine=libaio -bs=4096k -size=1G -direct=0 
-runtime=600 -directory=/mnt/dfs/iotest -numjobs=60 -thread -group_reporting 
-name=i
   
   # Drop page cache of both client and datanode servers.
   pssh -h ~/hosts -t 0 -i "sync && echo 1 > /proc/sys/vm/drop_caches"
   
   # Do the random read test.
   fio -iodepth=1 -rw=randread -ioengine=libaio -bs=512 -size=1G -direct=0 
-runtime=120 -directory=/mnt/dfs/iotest -numjobs=60 -thread -group_reporting 
-name=i
   ```
   And the test results:
   
   - With default `max_background`, which is 12
   
   ```
   # cat /sys/fs/fuse/connections/55/max_background
   12
   # fio -iodepth=1 -rw=randread -ioengine=libaio -bs=512 -size=1G -direct=0 
-runtime=120 -directory=/mnt/dfs/iotest -numjobs=60 -thread -group_reporting 
-name=i
   i: (g=0): rw=randread, bs=(R) 512B-512B, (W) 512B-512B, (T) 512B-512B, 
ioengine=libaio, iodepth=1
   ...
   fio-3.7
   Starting 60 threads
   Jobs: 60 (f=60): [r(60)][100.0%][r=722KiB/s,w=0KiB/s][r=1444,w=0 IOPS][eta 
00m:00s]]
   i: (groupid=0, jobs=60): err= 0: pid=13143: Fri Jan 14 14:34:35 2022
      read: IOPS=1365, BW=683KiB/s (699kB/s)(80.0MiB/120043msec)
       slat (nsec): min=1868, max=331615k, avg=43862093.94, stdev=10900971.93
       clat (nsec): min=615, max=234495, avg=2124.79, stdev=1111.62
        lat (usec): min=2, max=331618, avg=43865.27, stdev=10901.06
       clat percentiles (nsec):
        |  1.00th=[ 1176],  5.00th=[ 1416], 10.00th=[ 1528], 20.00th=[ 1704],
        | 30.00th=[ 1864], 40.00th=[ 1992], 50.00th=[ 2064], 60.00th=[ 2160],
        | 70.00th=[ 2256], 80.00th=[ 2384], 90.00th=[ 2576], 95.00th=[ 2768],
        | 99.00th=[ 3408], 99.50th=[ 7968], 99.90th=[16064], 99.95th=[19584],
        | 99.99th=[27008]
      bw (  KiB/s): min=    0, max=   16, per=1.66%, avg=11.34, stdev= 1.25, 
samples=14398
      iops        : min=    1, max=   32, avg=22.72, stdev= 2.48, samples=14398
     lat (nsec)   : 750=0.34%, 1000=0.10%
     lat (usec)   : 2=40.90%, 4=57.85%, 10=0.51%, 20=0.25%, 50=0.05%
     lat (usec)   : 100=0.01%, 250=0.01%
     cpu          : usr=0.02%, sys=0.04%, ctx=324069, majf=0, minf=60
     IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        issued rwts: total=163917,0,0,0 short=0,0,0,0 dropped=0,0,0,0
        latency   : target=0, window=0, percentile=100.00%, depth=1
   
   Run status group 0 (all jobs):
      READ: bw=683KiB/s (699kB/s), 683KiB/s-683KiB/s (699kB/s-699kB/s), 
io=80.0MiB (83.9MB), run=120043-120043msec
   ``` 
   
   - With -omax_background=100
   ```
   # cat /sys/fs/fuse/connections/55/max_background
   100
   # fio -iodepth=1 -rw=randread -ioengine=libaio -bs=512 -size=1G -direct=0 
-runtime=120 -directory=/mnt/dfs/iotest -numjobs=60 -thread -group_reporting 
-name=i
   i: (g=0): rw=randread, bs=(R) 512B-512B, (W) 512B-512B, (T) 512B-512B, 
ioengine=libaio, iodepth=1
   ...
   fio-3.7
   Starting 60 threads
   # Check JAVA_HOME
   Jobs: 60 (f=60): [r(60)][100.0%][r=1768KiB/s,w=0KiB/s][r=3536,w=0 IOPS][eta 
00m:00s]]
   i: (groupid=0, jobs=60): err= 0: pid=12582: Fri Jan 14 14:31:25 2022
      read: IOPS=3569, BW=1785KiB/s (1828kB/s)(209MiB/120037msec)
       slat (nsec): min=1576, max=708718k, avg=16797865.36, stdev=16920653.78
       clat (nsec): min=603, max=343984, avg=2023.56, stdev=1829.69
        lat (usec): min=2, max=708721, avg=16800.82, stdev=16920.78
       clat percentiles (nsec):
        |  1.00th=[  748],  5.00th=[ 1224], 10.00th=[ 1400], 20.00th=[ 1592],
        | 30.00th=[ 1736], 40.00th=[ 1848], 50.00th=[ 1928], 60.00th=[ 2008],
        | 70.00th=[ 2096], 80.00th=[ 2192], 90.00th=[ 2352], 95.00th=[ 2512],
        | 99.00th=[ 7648], 99.50th=[11968], 99.90th=[21632], 99.95th=[25984],
        | 99.99th=[39168]
      bw (  KiB/s): min=    6, max=   54, per=1.67%, avg=29.71, stdev= 6.19, 
samples=14397
      iops        : min=   12, max=  108, avg=59.46, stdev=12.37, samples=14397
     lat (nsec)   : 750=0.99%, 1000=0.80%
     lat (usec)   : 2=57.08%, 4=39.87%, 10=0.60%, 20=0.53%, 50=0.13%
     lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%
     cpu          : usr=0.04%, sys=0.08%, ctx=423468, majf=0, minf=60
     IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        issued rwts: total=428483,0,0,0 short=0,0,0,0 dropped=0,0,0,0
        latency   : target=0, window=0, percentile=100.00%, depth=1
   
   Run status group 0 (all jobs):
      READ: bw=1785KiB/s (1828kB/s), 1785KiB/s-1785KiB/s (1828kB/s-1828kB/s), 
io=209MiB (219MB), run=120037-120037msec
   ```
   
   In our test, by setting `max_background` to 100 will improve the read IOPS 
from 1365 to 3569. And when the resources like cpu/memory are sufficient(which 
is not a problem generally), there seems no side effects of setting a bigger 
value. We have been running `-omax_background=100` in our production 
environment for months and it looks good. @fapifta 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 708894)
    Time Spent: 2h 50m  (was: 2h 40m)

> Improve FUSE IO performance by supporting FUSE parameter max_background
> -----------------------------------------------------------------------
>
>                 Key: HDFS-16403
>                 URL: https://issues.apache.org/jira/browse/HDFS-16403
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: fuse-dfs
>    Affects Versions: 3.3.0, 3.3.1
>            Reporter: daimin
>            Assignee: daimin
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When we examining the FUSE IO performance on HDFS, we found that the 
> simultaneous IO requests number are limited to a fixed number, like 12. This 
> limitation makes the IO performance on FUSE client quite unacceptable. We did 
> some research on this and inspired by the article  [Performance and Resource 
> Utilization of FUSE User-Space File 
> Systems|https://dl.acm.org/doi/fullHtml/10.1145/3310148], clearly the FUSE 
> parameter '{{{}max_background{}}}' decides the simultaneous IO requests 
> number, which is 12 by default.
> We add 'max_background' to fuse_dfs mount options,  the FUSE kernel will take 
> effect when an option value is given.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to