Re: [Gluster-users] glusterfs under high load failing?

Pranith Kumar Karampuri Wed, 15 Oct 2014 08:25:40 -0700


On 10/14/2014 01:20 AM, Roman wrote:

ok. done.
this time there were no disconnects, at least all of vms are working,but got some mails from VM about IO writes again.
WARNINGs: Read IO Wait time is 1.45 (outside range [0:1]).

This warning says 'Read IO wait' and there is not a single READoperation that came to gluster. Wondering why that is :-/. Any clue?There is at least one write which took 3 seconds according to the stats.At least one synchronization operation (FINODELK) took 23 seconds. Couldyou give logs of this run? for mount, glustershd, bricks.


Pranith

here is the output

root@stor1:~# gluster volume profile HA-WIN-TT-1T info
Brick: stor1:/exports/NFS-WIN/1T
--------------------------------
Cumulative Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
%-latency Avg-latency Min-Latency Max-Latency No. of callsFop--------- ----------- ----------- ----------- ----------------0.00 0.00 us 0.00 us 0.00 us 25RELEASE0.00 0.00 us 0.00 us 0.00 us 16RELEASEDIR0.00 64.00 us 52.00 us 76.00 us 2ENTRYLK0.00 73.50 us 51.00 us 96.00 us 2FLUSH0.00 68.43 us 30.00 us 135.00 us 7STATFS0.00 54.31 us 44.00 us 109.00 us 16OPENDIR0.00 50.75 us 16.00 us 74.00 us 24FSTAT0.00 47.77 us 19.00 us 119.00 us 26GETXATTR0.00 59.21 us 21.00 us 89.00 us 24OPEN0.00 59.39 us 22.00 us 296.00 us 28READDIR0.00 4972.00 us 4972.00 us 4972.00 us 1CREATE0.00 97.42 us 19.00 us 184.00 us 62LOOKUP0.00 89.49 us 20.00 us 656.00 us 324FXATTROP3.91 1255944.81 us 127.00 us 23397532.00 us 189FSYNC7.40 3406275.50 us 17.00 us 23398013.00 us 132INODELK34.96 94598.02 us 8.00 us 23398705.00 us 22445FINODELK53.73 442.66 us 79.00 us 3116494.00 us 7372799WRITE
    Duration: 7813 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Interval 0 Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
%-latency Avg-latency Min-Latency Max-Latency No. of callsFop--------- ----------- ----------- ----------- ----------------0.00 0.00 us 0.00 us 0.00 us 25RELEASE0.00 0.00 us 0.00 us 0.00 us 16RELEASEDIR0.00 64.00 us 52.00 us 76.00 us 2ENTRYLK0.00 73.50 us 51.00 us 96.00 us 2FLUSH0.00 68.43 us 30.00 us 135.00 us 7STATFS0.00 54.31 us 44.00 us 109.00 us 16OPENDIR0.00 50.75 us 16.00 us 74.00 us 24FSTAT0.00 47.77 us 19.00 us 119.00 us 26GETXATTR0.00 59.21 us 21.00 us 89.00 us 24OPEN0.00 59.39 us 22.00 us 296.00 us 28READDIR0.00 4972.00 us 4972.00 us 4972.00 us 1CREATE0.00 97.42 us 19.00 us 184.00 us 62LOOKUP0.00 89.49 us 20.00 us 656.00 us 324FXATTROP3.91 1255944.81 us 127.00 us 23397532.00 us 189FSYNC7.40 3406275.50 us 17.00 us 23398013.00 us 132INODELK34.96 94598.02 us 8.00 us 23398705.00 us 22445FINODELK53.73 442.66 us 79.00 us 3116494.00 us 7372799WRITE
    Duration: 7813 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Brick: stor2:/exports/NFS-WIN/1T
--------------------------------
Cumulative Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
%-latency Avg-latency Min-Latency Max-Latency No. of callsFop--------- ----------- ----------- ----------- ----------------0.00 0.00 us 0.00 us 0.00 us 25RELEASE0.00 0.00 us 0.00 us 0.00 us 16RELEASEDIR0.00 61.50 us 46.00 us 77.00 us 2ENTRYLK0.00 82.00 us 67.00 us 97.00 us 2FLUSH0.00 265.00 us 265.00 us 265.00 us 1CREATE0.00 57.43 us 30.00 us 85.00 us 7STATFS0.00 61.12 us 37.00 us 107.00 us 16OPENDIR0.00 44.04 us 12.00 us 86.00 us 24FSTAT0.00 41.42 us 24.00 us 96.00 us 26GETXATTR0.00 45.93 us 24.00 us 133.00 us 28READDIR0.00 57.17 us 25.00 us 147.00 us 24OPEN0.00 145.28 us 31.00 us 288.00 us 32READDIRP0.00 39.50 us 10.00 us 152.00 us 132INODELK0.00 330.97 us 20.00 us 14280.00 us 62LOOKUP0.00 79.06 us 19.00 us 851.00 us 430FXATTROP0.02 29.32 us 7.00 us 28154.00 us 22568FINODELK7.80 1313096.68 us 125.00 us 23281862.00 us 189FSYNC92.18 397.92 us 76.00 us 1838343.00 us 7372799WRITE
    Duration: 7811 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Interval 0 Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
%-latency Avg-latency Min-Latency Max-Latency No. of callsFop--------- ----------- ----------- ----------- ----------------0.00 0.00 us 0.00 us 0.00 us 25RELEASE0.00 0.00 us 0.00 us 0.00 us 16RELEASEDIR0.00 61.50 us 46.00 us 77.00 us 2ENTRYLK0.00 82.00 us 67.00 us 97.00 us 2FLUSH0.00 265.00 us 265.00 us 265.00 us 1CREATE0.00 57.43 us 30.00 us 85.00 us 7STATFS0.00 61.12 us 37.00 us 107.00 us 16OPENDIR0.00 44.04 us 12.00 us 86.00 us 24FSTAT0.00 41.42 us 24.00 us 96.00 us 26GETXATTR0.00 45.93 us 24.00 us 133.00 us 28READDIR0.00 57.17 us 25.00 us 147.00 us 24OPEN0.00 145.28 us 31.00 us 288.00 us 32READDIRP0.00 39.50 us 10.00 us 152.00 us 132INODELK0.00 330.97 us 20.00 us 14280.00 us 62LOOKUP0.00 79.06 us 19.00 us 851.00 us 430FXATTROP0.02 29.32 us 7.00 us 28154.00 us 22568FINODELK7.80 1313096.68 us 125.00 us 23281862.00 us 189FSYNC92.18 397.92 us 76.00 us 1838343.00 us 7372799WRITE
    Duration: 7811 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

does it make something more clear?
2014-10-13 20:40 GMT+03:00 Roman <[email protected]<mailto:[email protected]>>:
    i think i may know what was an issue. There was an iscsitarget
    service runing, that was exporting this generated block device. so
    maybe my collegue Windows server picked it up and mountd :) I'll
    if it will happen again.

    2014-10-13 20:27 GMT+03:00 Roman <[email protected]
    <mailto:[email protected]>>:

        So may I restart the volume and start the test, or you need
        something else from this issue?

        2014-10-13 19:49 GMT+03:00 Pranith Kumar Karampuri
        <[email protected] <mailto:[email protected]>>:


            On 10/13/2014 10:03 PM, Roman wrote:
            hmm,
            seems like another strange issue? Seen this before. Had
            to restart the volume to get my empty space back.
            root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
            total 943718400
            -rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
            root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
            root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
            Filesystem    Size  Used Avail Use% Mounted on
            rootfs    282G  1.1G  266G   1% /
            udev     10M     0   10M   0% /dev
            tmpfs   1.4G  228K  1.4G   1% /run
            /dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba
             282G  1.1G  266G   1% /
            tmpfs   5.0M     0  5.0M   0% /run/lock
            tmpfs   5.2G     0  5.2G   0% /run/shm
            stor1:HA-WIN-TT-1T   1008G  901G   57G  95%
            /srv/nfs/HA-WIN-TT-1T

            no file, but size is still 901G.
            Both servers show the same.
            Do I really have to restart the volume to fix that?
            IMO this can happen if there is an fd leak. open-fd is the
            only variable that can change with volume restart. How do
            you re-create the bug?

            Pranith
            2014-10-13 19:30 GMT+03:00 Roman <[email protected]
            <mailto:[email protected]>>:

                Sure.
                I'll let it to run for this night .

                2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri
                <[email protected] <mailto:[email protected]>>:

                    hi Roman,
                         Do you think we can run this test again?
                    this time, could you enable 'gluster volume
                    profile <volname> start', do the same test.
                    Provide output of 'gluster volume profile
                    <volname> info' and logs after the test?

                    Pranith

                    On 10/13/2014 09:45 PM, Roman wrote:
                    Sure !

                    root@stor1:~# gluster volume info

                    Volume Name: HA-2TB-TT-Proxmox-cluster
                    Type: Replicate
                    Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
                    Status: Started
                    Number of Bricks: 1 x 2 = 2
                    Transport-type: tcp
                    Bricks:
                    Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
                    Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
                    Options Reconfigured:
                    nfs.disable: 0
                    network.ping-timeout: 10

                    Volume Name: HA-WIN-TT-1T
                    Type: Replicate
                    Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
                    Status: Started
                    Number of Bricks: 1 x 2 = 2
                    Transport-type: tcp
                    Bricks:
                    Brick1: stor1:/exports/NFS-WIN/1T
                    Brick2: stor2:/exports/NFS-WIN/1T
                    Options Reconfigured:
                    nfs.disable: 1
                    network.ping-timeout: 10



                    2014-10-13 19:09 GMT+03:00 Pranith Kumar
                    Karampuri <[email protected]
                    <mailto:[email protected]>>:

                        Could you give your 'gluster volume info'
                        output?

                        Pranith

                        On 10/13/2014 09:36 PM, Roman wrote:
                        Hi,

                        I've got this kind of setup (servers run
                        replica)


                        @ 10G backend
                        gluster storage1
                        gluster storage2
                        gluster client1

                        @1g backend
                        other gluster clients

                        Servers got HW RAID5 with SAS disks.

                        So today I've desided to create a 900GB
                        file for iscsi target that will be located
                        @ glusterfs separate volume, using dd (just
                        a dummy file filled with zeros, bs=1G count
                        900)
                        For the first of all the process took
                        pretty lots of time, the writing speed was
                        130 MB/sec (client port was 2 gbps, servers
                        ports were running @ 1gbps).
                        Then it reported something like "endpoint
                        is not connected" and all of my VMs on the
                        other volume started to give me IO errors.
                        Servers load was around 4,6 (total 12 cores)

                        Maybe it was due to timeout of 2 secs, so
                        I've made it a big higher, 10 sec.

                        Also during the dd image creation time, VMs
                        very often reported me that their disks are
                        slow like

                        WARNINGs: Read IO Wait time is -0.02
                        (outside range [0:1]).

                        Is 130MB /sec is the maximum bandwidth for
                        all of the volumes in total? That why would
                        we need 10g backends?

                        HW Raid local speed is 300 MB/sec, so it
                        should not be an issue. any ideas or mby
                        any advices?


                        Maybe some1 got optimized sysctl.conf for
                        10G backend?

                        mine is pretty simple, which can be found
                        from googling.


                        just to mention: those VM-s were connected
                        using separate 1gbps intraface, which
                        means, they should not be affected by the
                        client with 10g backend.


                        logs are pretty useless, they just say
                         this during the outage


                        [2014-10-13 12:09:18.392910] W
                        [client-handshake.c:276:client_ping_cbk]
                        0-HA-2TB-TT-Proxmox-cluster-client-0: timer
                        must have expired

                        [2014-10-13 12:10:08.389708] C
                        [client-handshake.c:127:rpc_client_ping_timer_expired]
                        0-HA-2TB-TT-Proxmox-cluster-client-0:
                        server 10.250.0.1:49159
                        <http://10.250.0.1:49159> has not responded
                        in the last 2 seconds, disconnecting.

                        [2014-10-13 12:10:08.390312] W
                        [client-handshake.c:276:client_ping_cbk]
                        0-HA-2TB-TT-Proxmox-cluster-client-0: timer
                        must have expired

                        so I decided to set the timout a bit higher.

                        So it seems to me, that under high load
                        GlusterFS is not useable? 130 MB/s is not
                        that much to get some kind of timeouts or
                        makeing the systme so slow, that VM-s
                        feeling themselves bad.

                        Of course, after the disconnection, healing
                        process was started, but as VM-s lost
                        connection to both of servers, it was
                        pretty useless, they could not run anymore.
                        and BTW, when u load the server with such
                        huge job (dd of 900GB), healing process
                        goes soooooo slow :)
--Best regards,
                        Roman.


                        _______________________________________________
                        Gluster-users mailing list
                        [email protected]  
<mailto:[email protected]>
                        
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--Best regards,
                    Roman.
--Best regards,
                Roman.
--Best regards,
            Roman.
--Best regards,
        Roman.
--Best regards,
    Roman.




--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs under high load failing?

Reply via email to