A jumbo ethernet frame can be 9000 bytes. The ethernet frame header is at least 38 bytes, and the minimum TCP/IP header size is 40 bytes or 0.78% of the jumbo frame combined. Gluster's RPC also adds a few bytes (not sure how many and don't have time to test at the moment but for the sake of argument we'll just say 20 bytes) but, all together, it's about 99% efficient. If you write 20 bytes to a file (for an extreme example) then you'll have your 20 bytes+RPC header+TCP/IP header+ethernet header; 118 bytes in headers for 20 bytes of data. That header being 90% of the frame means that your packet is only 10% efficient. That's per replica so if you have a replica 3 that's three individual frames with 118 bytes of headers each to write the same 20 bytes of data. Those go out to the three servers and wait for their response. So you have a network round trip + a tiny bit of latency for stacking the three frames in the kernel + disk write latency. That's a lot of overhead and cannot ever be as fast as writing to a local disk for any networked storage.

The question, however, is does it need to be? Do you care if a single thread is slower in a clustered environment than it would be on a local raid stack? In good clustered engineering your workload will be handled by multiple threads over a cluster of workers. Overall, you have more threads than you could have on a single machine. This allows servicing a greater overall workload than you could without a cluster. I refer to that as comparing apples to orchards (1 <https://joejulian.name/post/dont-get-stuck-micro-engineering-for-scale/>).

On 04/13/18 10:58, Anastasia Belyaeva wrote:
Thanks a lot for your reply!

You guessed it right though  - mailing lists, various blogs, documentation, videos and even source code at this point. Changing some off the options does make performance slightly better, but nothing particularly groundbreaking.

So, if I understand you correctly, no one has yet managed to get acceptable performance (relative to underlying hardware capabilities) with smaller block sizes? Is there an explanation for this?


2018-04-13 1:57 GMT+03:00 Vlad Kopylov <vladk...@gmail.com <mailto:vladk...@gmail.com>>:

    Guess you went through user lists and tried something like this
    already
    http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
    <http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
    I have a same exact setup and below is as far as it went after
    months of trail and error.
    We all have somewhat same setup and same issue with this - you can
    find same post as yours on the daily basis.

    On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva
    <anastasia....@gmail.com <mailto:anastasia....@gmail.com>> wrote:

        Hello everybody!

        I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those
        are actually virtual machines located on 3 separate physical
        XenServer7.1 servers)

        They are all connected via infiniband network. Iperf3 shows
        around *23 Gbit/s network bandwidth *between each 2 of them.

        Each server has 3 HDD put into a *stripe*3 thin pool (LVM2)
        *with logical volume created on top of it, formatted with
        *xfs*. Gluster top reports the following throughput:

            root@fsnode2 ~ $ gluster volume top r3vol write-perf bs
            4096 count 524288 list-cnt 0
            Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
            Throughput *631.82 MBps *time 3.3989 secs
            Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
            Throughput *566.96 MBps *time 3.7877 secs
            Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
            Throughput *546.65 MBps *time 3.9285 secs


            root@fsnode2 ~ $ gluster volume top r2vol write-perf bs
            4096 count 524288 list-cnt 0
            Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
            Throughput *539.60 MBps *time 3.9798 secs
            Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
            Throughput *580.07 MBps *time 3.7021 secs


        And two *pure replicated ('replica 2' and 'replica 3')*
        volumes. *The 'replica 2' volume is for testing purpose only.

            Volume Name: r2vol
            Type: Replicate
            Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
            Status: Started
            Snapshot Count: 0
            Number of Bricks: 1 x 2 = 2
            Transport-type: tcp
            Bricks:
            Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
            Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
            Options Reconfigured:
            nfs.disable: on

            Volume Name: r3vol
            Type: Replicate
            Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
            Status: Started
            Snapshot Count: 0
            Number of Bricks: 1 x 3 = 3
            Transport-type: tcp
            Bricks:
            Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
            Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
            Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
            Options Reconfigured:
            nfs.disable: on



        *Client *is also gluster 3.12.6, Centos 7.3 virtual machine,
        *FUSE mount*

            root@centos7u3-nogdesktop2 ~ $ mount |grep gluster
            gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type
            fuse.glusterfs
            
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
            gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type
            fuse.glusterfs
            
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)



        *The problem *is that there is a significant performance loss
        with smaller block sizes. For example:

        _4K block size_
        [replica 3 volume]
        root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
        of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*

        [replica 2 volume]
        root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
        of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
        262144+0 records in
        262144+0 records out
        1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*
        *
        *
        _512K block size_*
        *
        [replica 3 volume]_
        _
        root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
        of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
        2048+0 records in
        2048+0 records out
        1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*

        [replica 2 volume]
        root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
        of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
        2048+0 records in
        2048+0 records out
        1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*
        *
        *
        With bigger block size It's still not where I expect it to be,
        but at least it starts to make some sense.

        I've been trying to solve this for a very long time with no luck.
        I've already tried both kernel tuning (different 'tuned'
        profiles and the ones recommended in the "Linux Kernel Tuning"
        section) and tweaking gluster volume options, including
        write-behind/flush-behind/write-behind-window-size.
        The latter, to my surprise, didn't make any difference. 'Cause
        at first I thought it was the buffering issue but it turns out
        it does buffer writes, just not very efficient (well at least
        what it looks like in the *gluster profile output*)

            root@fsnode2 ~ $ gluster volume profile r3vol info clear
            ...
            Cleared stats.


            root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
            of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
            262144+0 records in
            262144+0 records out
            1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s

            root@fsnode2 ~ $ gluster volume profile r3vol info
            Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
            -------------------------------------------------------
            Cumulative Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      18.00 us      18.00 us  18.00 us          
               1      STATFS
                  0.00      20.50 us      11.00 us  30.00 us          
               2       FLUSH
                  0.00      22.50 us      17.00 us  28.00 us          
               2    FINODELK
                  0.01      76.50 us      65.00 us  88.00 us          
               2    FXATTROP
                  0.01     177.00 us     177.00 us 177.00 us          
               1      CREATE
                  0.02      56.14 us      23.00 us 128.00 us          
               7      LOOKUP
                  0.02     259.00 us      20.00 us 498.00 us          
               2     ENTRYLK
                 99.94      59.23 us      17.00 us 10914.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes
            Interval 0 Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      18.00 us      18.00 us  18.00 us          
               1      STATFS
                  0.00      20.50 us      11.00 us  30.00 us          
               2       FLUSH
                  0.00      22.50 us      17.00 us  28.00 us          
               2    FINODELK
                  0.01      76.50 us      65.00 us  88.00 us          
               2    FXATTROP
                  0.01     177.00 us     177.00 us 177.00 us          
               1      CREATE
                  0.02      56.14 us      23.00 us 128.00 us          
               7      LOOKUP
                  0.02     259.00 us      20.00 us 498.00 us          
               2     ENTRYLK
                 99.94      59.23 us      17.00 us 10914.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes
            Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
            -------------------------------------------------------
            Cumulative Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      33.00 us      33.00 us  33.00 us          
               1      STATFS
                  0.00      22.50 us      13.00 us  32.00 us          
               2     ENTRYLK
                  0.00      32.00 us      26.00 us  38.00 us          
               2       FLUSH
                  0.01      47.50 us      16.00 us  79.00 us          
               2    FINODELK
                  0.01     157.00 us     157.00 us 157.00 us          
               1      CREATE
                  0.01      92.00 us      70.00 us 114.00 us          
               2    FXATTROP
                  0.03      72.57 us      39.00 us 121.00 us          
               7      LOOKUP
                 99.94      47.97 us      15.00 us  1598.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes
            Interval 0 Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      33.00 us      33.00 us  33.00 us          
               1      STATFS
                  0.00      22.50 us      13.00 us  32.00 us          
               2     ENTRYLK
                  0.00      32.00 us      26.00 us  38.00 us          
               2       FLUSH
                  0.01      47.50 us      16.00 us  79.00 us          
               2    FINODELK
                  0.01     157.00 us     157.00 us 157.00 us          
               1      CREATE
                  0.01      92.00 us      70.00 us 114.00 us          
               2    FXATTROP
                  0.03      72.57 us      39.00 us 121.00 us          
               7      LOOKUP
                 99.94      47.97 us      15.00 us  1598.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes
            Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
            -------------------------------------------------------
            Cumulative Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      58.00 us      58.00 us  58.00 us          
               1      STATFS
                  0.00      38.00 us      38.00 us  38.00 us          
               2     ENTRYLK
                  0.01      59.00 us      32.00 us  86.00 us          
               2       FLUSH
                  0.01      81.00 us      33.00 us 129.00 us          
               2    FINODELK
                  0.01      91.50 us      73.00 us 110.00 us          
               2    FXATTROP
                  0.01     239.00 us     239.00 us 239.00 us          
               1      CREATE
                  0.04     103.14 us      63.00 us 210.00 us          
               7      LOOKUP
                 99.92      52.99 us      16.00 us 11289.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes
            Interval 0 Stats:
               Block Size:               4096b+      8192b+          
                16384b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 1576        4173          
                  19605
               Block Size:              32768b+     65536b+          
               131072b+
             No. of Reads:                    0           0          
                      0
            No. of Writes:                 7777        1847          
                    657
             %-latency   Avg-latency   Min-Latency Max-Latency   No.
            of calls         Fop
             ---------   -----------   ----------- -----------  
            ------------        ----
                  0.00       0.00 us       0.00 us 0.00 us            
             1     RELEASE
                  0.00      58.00 us      58.00 us  58.00 us          
               1      STATFS
                  0.00      38.00 us      38.00 us  38.00 us          
               2     ENTRYLK
                  0.01      59.00 us      32.00 us  86.00 us          
               2       FLUSH
                  0.01      81.00 us      33.00 us 129.00 us          
               2    FINODELK
                  0.01      91.50 us      73.00 us 110.00 us          
               2    FXATTROP
                  0.01     239.00 us     239.00 us 239.00 us          
               1      CREATE
                  0.04     103.14 us      63.00 us 210.00 us          
               7      LOOKUP
                 99.92      52.99 us      16.00 us 11289.00 us        
             35635       WRITE
                Duration: 38 seconds
               Data Read: 0 bytes
            Data Written: 1073741824 bytes



        At this point I'm officially run out of idea where to look
        next. So any help, suggestions or pointers are highly
        appreciated!

-- Best regards,
        Anastasia Belyaeva






        _______________________________________________
        Gluster-users mailing list
        Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
        http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>





--
Best regards,
Anastasia Belyaeva

С уважением,
Анастасия Беляева






_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to