Re: [Gluster-users] glusterfs under high load failing?

Joe Julian Mon, 13 Oct 2014 09:57:21 -0700

Looks like you're mounting NFS? That would be the FSCache in the client.


On 10/13/2014 09:33 AM, Roman wrote:

hmm,

seems like another strange issue? Seen this before. Had to restart thevolume to get my empty space back.

root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
total 943718400
-rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
Filesystem  Size  Used Avail Use% Mounted on
rootfs  282G  1.1G  266G   1% /
udev 10M     0   10M   0% /dev
tmpfs 1.4G  228K  1.4G   1% /run

/dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba 282G 1.1G266G 1% /

tmpfs 5.0M     0  5.0M   0% /run/lock
tmpfs 5.2G     0  5.2G   0% /run/shm
stor1:HA-WIN-TT-1T 1008G  901G   57G  95% /srv/nfs/HA-WIN-TT-1T

no file, but size is still 901G.
Both servers show the same.
Do I really have to restart the volume to fix that?

2014-10-13 19:30 GMT+03:00 Roman <[email protected]<mailto:[email protected]>>:


    Sure.
    I'll let it to run for this night .

    2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri
    <[email protected] <mailto:[email protected]>>:

        hi Roman,
             Do you think we can run this test again? this time, could
        you enable 'gluster volume profile <volname> start', do the
        same test. Provide output of 'gluster volume profile <volname>
        info' and logs after the test?

        Pranith

        On 10/13/2014 09:45 PM, Roman wrote:

        Sure !

        root@stor1:~# gluster volume info

        Volume Name: HA-2TB-TT-Proxmox-cluster
        Type: Replicate
        Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
        Status: Started
        Number of Bricks: 1 x 2 = 2
        Transport-type: tcp
        Bricks:
        Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
        Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
        Options Reconfigured:
        nfs.disable: 0
        network.ping-timeout: 10

        Volume Name: HA-WIN-TT-1T
        Type: Replicate
        Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
        Status: Started
        Number of Bricks: 1 x 2 = 2
        Transport-type: tcp
        Bricks:
        Brick1: stor1:/exports/NFS-WIN/1T
        Brick2: stor2:/exports/NFS-WIN/1T
        Options Reconfigured:
        nfs.disable: 1
        network.ping-timeout: 10



        2014-10-13 19:09 GMT+03:00 Pranith Kumar Karampuri
        <[email protected] <mailto:[email protected]>>:

            Could you give your 'gluster volume info' output?

            Pranith

            On 10/13/2014 09:36 PM, Roman wrote:

            Hi,

            I've got this kind of setup (servers run replica)


            @ 10G backend
            gluster storage1
            gluster storage2
            gluster client1

            @1g backend
            other gluster clients

            Servers got HW RAID5 with SAS disks.

            So today I've desided to create a 900GB file for iscsi
            target that will be located @ glusterfs separate volume,
            using dd (just a dummy file filled with zeros, bs=1G
            count 900)
            For the first of all the process took pretty lots of
            time, the writing speed was 130 MB/sec (client port was
            2 gbps, servers ports were running @ 1gbps).
            Then it reported something like "endpoint is not
            connected" and all of my VMs on the other volume started
            to give me IO errors.
            Servers load was around 4,6 (total 12 cores)

            Maybe it was due to timeout of 2 secs, so I've made it a
            big higher, 10 sec.

            Also during the dd image creation time, VMs very often
            reported me that their disks are slow like

            WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).

            Is 130MB /sec is the maximum bandwidth for all of the
            volumes in total? That why would we need 10g backends?

            HW Raid local speed is 300 MB/sec, so it should not be
            an issue. any ideas or mby any advices?


            Maybe some1 got optimized sysctl.conf for 10G backend?

            mine is pretty simple, which can be found from googling.


            just to mention: those VM-s were connected using
            separate 1gbps intraface, which means, they should not
            be affected by the client with 10g backend.


            logs are pretty useless, they just say  this during the
            outage


            [2014-10-13 12:09:18.392910] W
            [client-handshake.c:276:client_ping_cbk]
            0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have
            expired

            [2014-10-13 12:10:08.389708] C
            [client-handshake.c:127:rpc_client_ping_timer_expired]
            0-HA-2TB-TT-Proxmox-cluster-client-0: server
            10.250.0.1:49159 <http://10.250.0.1:49159> has not
            responded in the last 2 seconds, disconnecting.

            [2014-10-13 12:10:08.390312] W
            [client-handshake.c:276:client_ping_cbk]
            0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have
            expired

            so I decided to set the timout a bit higher.

            So it seems to me, that under high load GlusterFS is not
            useable? 130 MB/s is not that much to get some kind of
            timeouts or makeing the systme so slow, that VM-s
            feeling themselves bad.

            Of course, after the disconnection, healing process was
            started, but as VM-s lost connection to both of servers,
            it was pretty useless, they could not run anymore. and
            BTW, when u load the server with such huge job (dd of
            900GB), healing process goes soooooo slow :)

--Best regards,

            Roman.


            _______________________________________________
            Gluster-users mailing list
            [email protected]  <mailto:[email protected]>
            http://supercolony.gluster.org/mailman/listinfo/gluster-users

--Best regards,

        Roman.

--Best regards,

    Roman.




--
Best regards,
Roman.


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs under high load failing?

Reply via email to