Libvirt log does not contain anything related. The error messages go from dmesg of virtual machine.

What I can see in gluster logs is that connection between two peers was lost. The physical connection, however, worked all the time.

Dne 5.9.2014 0:47, Joe Julian napsal(a):
That is about as far removed from anything useful for troubleshooting as possible. You're reporting a symptom from within a virtualized environment. It's the real systems that have the useful logs. Any errors on the client or brick logs? Libvirt logs? dmesg on the server? Is either cpu bound? In swap?


On September 4, 2014 9:12:16 PM PDT, "Miloš Kozák" <[email protected]> wrote:

    Hi,

    I ran few more tests. I moved a file which is an VM image onto GlusterFS
    mount and along the load I got this on console of running VM:

    lost page write due to I/O error on vda1
    Buffer I/O error on device vda1, logical block 1049638
    lost page write due to I/O error on vda1
    Buffer I/O error on device vda1, logical block 1049646
    lost page write due to I/O error on vda1
    Buffer I/O error on device vda1, logical block 1049647
    lost page write due to I/O error on vda1
    Buffer I/O error on device vda1, logical block 1049649
    lost page write due to I/O error on vda1
    end_request: I/O error, dev vda, sector 8399688
    end_request: I/O error, dev vda, sector 8399728
    end_request: I/O error, dev vda, sector 8399736
    end_request: I/O error, dev vda, sector 8399776
    end_request: I/O error, dev vda, sector 8399792
    __ratelimit: 5 callbacks suppressed
    EXT4-fs error (device vda1):
    ext4_find_entry: reading directory #398064
    offset 0
    EXT4-fs error (device vda1): ext4_find_entry: reading directory #398064
    offset 0
    EXT4-fs error (device vda1): ext4_find_entry: reading directory #132029
    offset 0

    Do you think it is related to options which are set to the volume?

          storage.owner-gid: 498
          storage.owner-uid: 498
          network.ping-timeout: 2
          performance.io  <http://performance.io>-thread-count: 3
          cluster.server-quorum-type: server
          network.remote-dio: enable
          cluster.eager-lock: enable
          performance.stat-prefetch: off
          performance.io  <http://performance.io>-cache: off
          performance.read-ahead: off
          performance.quick-read: off

    Thanks Milos


    Dne 14-09-03 v 04:01 PM Milos Kozak napsal(a):

        I have just tried to copy an VM image (raw) and causes the
        same problem. I have GlusterFS 3.5.2 On 9/3/2014 9:14 AM,
        Roman wrote:

            Hi, I had some issues with files generated from /dev/zero
            also. try real files or /dev/urandom :) I don't know, if
            there is a real issue/bug with files generated from
            /dev/zero ? Devs should check them out /me thinks.
            2014-09-03 16:11 GMT+03:00 Milos Kozak
            <[email protected] <mailto:[email protected]>>:
            Hi, I am facing a quite strange problem when I do have two
            servers with the same configuration and the same hardware.
            Servers are connected by bonded 1GE. I have one volume:
            [root@nodef02i 103]# gluster volume info Volume Name:
            ph-fs-0 Type: Replicate Volume ID:
            f8f569ea-e30c-43d0-bb94-__b2f1164a7c9a Status: Started
            Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks:
            Brick1: 10.11.100.1
            <http://10.11.100.1>:/gfs/s3-sata-10k/__fs Brick2:
            10.11.100.2 <http://10.11.100.2>:/gfs/s3-sata-10k/__fs
            Options Reconfigured: storage.owner-gid: 498
            storage.owner-uid: 498 network.ping-timeout: 2
            performance.io <http://performance.io>-thread-count: 3
            cluster.server-quorum-type: server network.remote-dio:
            enable cluster.eager-lock: enable
            performance.stat-prefetch: off performance.io
            <http://performance.io>-cache: off performance.read-ahead:
            off performance.quick-read: off Intended to host virtual
            servers (KVM), the configuration is according to the
            gluster blog. Currently I have got only one virtual server
            deployed on top of this volume in order to see effects of
            my stress tests. During the tests I write to the volume
            mounted through FUSE by dd (currently on one writing at a
            moment): dd if=/dev/zero of=test2.img bs=1M count=20000
            conv=fdatasync Test 1) I run dd on nodef02i. Load on
            nodef02i is max 1erl but on the nodef01i around 14erl (I
            do have 12threads CPU). After the write is done the load
            on nodef02i goes down, but the load goes up to 28erl on
            nodef01i. 20minutes it stays the same. In the mean time I
            can see: [root@nodef01i 103]# gluster volume heal ph-fs-0
            info Volume ph-fs-0 is not started (Or) All the bricks are
            not running. Volume heal failed [root@nodef02i 103]#
            gluster volume heal ph-fs-0 info Brick
            nodef01i.czprg:/gfs/s3-sata-__10k/fs/
            /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
            undergoing heal /test.img - Possibly undergoing heal
            Number of entries: 2 Brick
            nodef02i.czprg:/gfs/s3-sata-__10k/fs/
            /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
            undergoing heal /test.img - Possibly undergoing heal
            Number of entries: 2 [root@nodef01i 103]# gluster volume
            status Status of volume: ph-fs-0 Gluster process Port
            Online Pid
            
------------------------------------------------------------------------
            Brick 10.11.100.1
            <http://10.11.100.1>:/gfs/s3-sata-10k/__fs 49152 Y 56631
            Brick 10.11.100.2
            <http://10.11.100.2>:/gfs/s3-sata-10k/__fs 49152 Y 3372
            NFS Server on localhost 2049 Y 56645 Self-heal Daemon on
            localhost N/A Y 56649 NFS Server on 10.11.100.2
            <http://10.11.100.2> 2049 Y 3386 Self-heal Daemon on
            10.11.100.2 <http://10.11.100.2> N/A Y 3387 Task Status of
            Volume ph-fs-0
            
------------------------------------------------------------------------
            There are no active volume tasks This very high load takes
            another 20-30minutes. During the first test I restarted
            glusterd service after 10minutes because everything seemed
            to me that the service does not work, but I could see very
            high load on the nodef01i. Consequently, the virtual
            server yields errors about problems with EXT4 filesystem -
            MySQL stops. When the load culminated I tried to run the
            same test but from opposite direction. I wrote (dd) from
            nodef01i - test2. Happened more or less the same. I gained
            extremely high load on nodef01i and minimal load on
            nodef02i. Outputs from heal were more or less the same.. I
            would like to tweak this but I don´t know what I should
            focus on. Thank you for help. Milos
            
------------------------------------------------------------------------
            Gluster-users mailing list [email protected]
            <mailto:[email protected]>
            http://supercolony.gluster.org/mailman/listinfo/gluster-users
-- Best regards, Roman.
        ------------------------------------------------------------------------
        Gluster-users mailing list [email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

    ------------------------------------------------------------------------

    Gluster-users mailing list
    [email protected]
    http://supercolony.gluster.org/mailman/listinfo/gluster-users


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Attachment: test3-nodef02i.tar.bz2
Description: Binary data

Attachment: test3-nodef01i.tar.bz2
Description: Binary data

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to