Re: [Gluster-users] 100% cpu on brick replication

Pranith Kumar Karampuri Fri, 29 May 2015 01:17:27 -0700

Could you give gluster volume info output?

Pranith


On 05/29/2015 01:18 PM, Pedro Oriani wrote:

I've set

cluster.entry-self-heal: off

Maybe I've missed, and when started the service on srv02 seemed to dothe job.

then i've restarted the service.

on srv02

11607 ? Ssl 0:00 /usr/sbin/glusterfs -s localhost--volfile-id gluster/glustershd -p/var/lib/glusterd/glustershd/run/glustershd.pid -l/var/log/glusterfs/glustershd.log -S/var/run/gluster/eb93ca526d4559069efc40da9c71b3a4.socket--xlator-option *replicate*.node-uuid=7207ea30-41e9-4344-8fc3-47743b83629e11612 ? Ssl 0:03 /usr/sbin/glusterfsd -s 172.16.0.2--volfile-id vol1.172.16.0.2.data-glusterfs-vol1-brick1-brick -p/var/lib/glusterd/vols/vol1/run/172.16.0.2-data-glusterfs-vol1-brick1-brick.pid-S /var/run/gluster/09285d60c2c8c9aa546602147a99a347.socket--brick-name /data/glusterfs/vol1/brick1/brick -l/var/log/glusterfs/bricks/data-glusterfs-vol1-brick1-brick.log--xlator-option*-posix.glusterd-uuid=7207ea30-41e9-4344-8fc3-47743b83629e--brick-port 49154 --xlator-option vol1-server.listen-port=49154



it's seems like self healing starts and brings down srv01, with 600% load

thanks,
Pedro

------------------------------------------------------------------------
Date: Fri, 29 May 2015 12:37:19 +0530
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [Gluster-users] 100% cpu on brick replication



On 05/29/2015 12:34 PM, Pedro Oriani wrote:

    Hi Pranith,

    it's for sure related to a replication / healing task, because
    occurses when you create a new replicated brick or when you bring
    back online an old one.
    The problem is that the cpu load on the online brick is so high
    that I cannot do normal operations.
    In my case when a replication / healing occurs, the cluster cannot
    serve content.
    I'm asking if there is a way to limit cpu usage in this case, or
    set a less aggressive mode, because otherwise I have to rethink
    the image repository.

Disable self-heal. I see that you already did that for self-healdaemon. Lets do that even for mounts.

gluster volume set <volname> cluster.entry-self-heal off

Let me know how that goes.

Pranith


    thanks,
    Pedro

    ------------------------------------------------------------------------
    Date: Fri, 29 May 2015 11:14:29 +0530
    From: [email protected] <mailto:[email protected]>
    To: [email protected] <mailto:[email protected]>;
    [email protected] <mailto:[email protected]>
    Subject: Re: [Gluster-users] 100% cpu on brick replication



    On 05/27/2015 08:48 PM, Pedro Oriani wrote:

        Hi All,
        I'm writing because I'm experiecing an issue with gluster's
        replication feature.
        I've a brick on srv1 with about 2TB of mixed side files,
        ranging from 10k a 300k
        When I add a new replication brick on srv2, the glusterfs
        process take all the cpu.
        This is unsuitable because the volume is not responding at
        normal r/w queries.

        Glusterfs version is 3.7.0

    Is it because of self-heals? Was the brick offline until then?

    Pranith


        the underlaying volume is xfs.


        Volume Name: vol1
        Type: Replicate
        Volume ID:
        Status: Started
        Number of Bricks: 1 x 2 = 2
        Transport-type: tcp
        Bricks:
        Brick1: 172.16.0.1:/data/glusterfs/vol1/brick1/brick
        Brick2: 172.16.0.2:/data/glusterfs/vol1/brick1/brick
        Options Reconfigured:
        performance.cache-size: 1gb
        cluster.self-heal-daemon: off
        cluster.data-self-heal-algorithm: full
        cluster.metadata-self-heal: off
        performance.cache-max-file-size: 2MB
        performance.cache-refresh-timeout: 1
        performance.stat-prefetch: off
        performance.read-ahead: on
        performance.quick-read: off
        performance.write-behind-window-size: 4MB
        performance.flush-behind: on
        performance.write-behind: on
        performance.io-thread-count: 32
        performance.io-cache: on
        network.ping-timeout: 2
        nfs.addr-namelookup: off
        performance.strict-write-ordering: on


        there is any parameter or hint that I can follow to limit cpu
        occupation to grant a replication with few lag on normal
        operations ?

        thank


        _______________________________________________
        Gluster-users mailing list
        [email protected]  <mailto:[email protected]>
        http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 100% cpu on brick replication

Reply via email to