On 04/18/2018 10:35 AM, Artem Russakovskii wrote:
Hi Ravi,

Could you please expand on how these would help?

By forcing full here, we move the logic from the CPU to network, thus decreasing CPU utilization, is that right?
Yes, 'diff' employs the rchecksum FOP which does a sha256  checksum which can consume CPU. So yes it is sort of shifting the load from CPU to the network. But if your average file size is small, it would make sense to copy the entire file instead of computing checksums.

This is assuming the CPU and disk utilization are caused by the differ and not by lstat and other calls or something.

    Option: cluster.data-self-heal-algorithm
    Default Value: (null)
    Description: Select between "full", "diff". The "full" algorithm
    copies the entire file from source to sink. The "diff" algorithm
    copies to sink only those blocks whose checksums don't match with
    those of source. If no option is configured the option is chosen
    dynamically as follows: If the file does not exist on one of the
    sinks or empty file exists or if the source file size is about the
    same as page size the entire file will be read and written i.e
    "full" algo, otherwise "diff" algo is chosen.

I really have no idea what this means and how/why it would help. Any more info on this option?

https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/granular-entry-self-healing.md should help.

    Option: cluster.granular-entry-heal
    Default Value: no
    Description: If this option is enabled, self-heal will resort to
    granular way of recording changelogs and doing entry self-heal.

Thank you.


Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net <http://beerpla.net/> | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR>

On Tue, Apr 17, 2018 at 9:58 PM, Ravishankar N <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:

    On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
    Following up here on a related and very serious for us issue.

    I took down one of the 4 replicate gluster servers for
    maintenance today. There are 2 gluster volumes totaling about
    600GB. Not that much data. After the server comes back online, it
    starts auto healing and pretty much all operations on gluster
    freeze for many minutes.

    For example, I was trying to run an ls -alrt in a folder with
    7300 files, and it took a good 15-20 minutes before returning.

    During this time, I can see iostat show 100% utilization on the
    brick, heal status takes many minutes to return, glusterfsd uses
    up tons of CPU (I saw it spike to 600%). gluster already has
    massive performance issues for me, but healing after a 4-hour
    downtime is on another level of bad perf.

    For example, this command took many minutes to run:

    gluster volume heal androidpolice_data3 info summary
    Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
    Status: Connected
    Total Number of entries: 91
    Number of entries in heal pending: 90
    Number of entries in split-brain: 0
    Number of entries possibly healing: 1

    Brick forge:/mnt/forge_block4/androidpolice_data3
    Status: Connected
    Total Number of entries: 87
    Number of entries in heal pending: 86
    Number of entries in split-brain: 0
    Number of entries possibly healing: 1

    Brick hive:/mnt/hive_block4/androidpolice_data3
    Status: Connected
    Total Number of entries: 87
    Number of entries in heal pending: 86
    Number of entries in split-brain: 0
    Number of entries possibly healing: 1

    Brick citadel:/mnt/citadel_block4/androidpolice_data3
    Status: Connected
    Total Number of entries: 0
    Number of entries in heal pending: 0
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Statistics showed a diminishing number of failed heals:
    Ending time of crawl: Tue Apr 17 21:13:08 2018

    Type of crawl: INDEX
    No. of entries healed: 2
    No. of entries in split-brain: 0
    No. of heal failed entries: 102

    Starting time of crawl: Tue Apr 17 21:13:09 2018

    Ending time of crawl: Tue Apr 17 21:14:30 2018

    Type of crawl: INDEX
    No. of entries healed: 4
    No. of entries in split-brain: 0
    No. of heal failed entries: 91

    Starting time of crawl: Tue Apr 17 21:14:31 2018

    Ending time of crawl: Tue Apr 17 21:15:34 2018

    Type of crawl: INDEX
    No. of entries healed: 0
    No. of entries in split-brain: 0
    No. of heal failed entries: 88

    Eventually, everything heals and goes back to at least where the
    roof isn't on fire anymore.

    The server stats and volume options were given in one of the
    previous replies to this thread.

    Any ideas or things I could run and show the output of to help
    diagnose? I'm also very open to working with someone on the team
    on a live debugging session if there's interest.

    It is likely that self-heal is causing the CPU spike due to the
    flood of lookups/ locks and checksum fops that the
    self-heal-daemon sends to the bricks.
    There's a script to control shd's cpu usage using cgroups. That
    should help in regulating self-heal traffic:
    <https://review.gluster.org/#/c/18404/> (see
    Other self-heal related volume options that you could change are
    setting 'cluster.data-self-heal-algorithm' to 'full' and
    'granular-entry-heal' to 'enable'.  `gluster volume set help`
    should give you more information about these options.

    Thank you.


    Founder, Android Police <http://www.androidpolice.com>, APK
    Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
    beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
    <https://plus.google.com/+ArtemRussakovskii> | @ArtemR

    On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
    <archon...@gmail.com <mailto:archon...@gmail.com>> wrote:

        Hi Vlad,

        I actually saw that post already and even asked a question 4
        days ago
        The accepted answer also seems to go against your suggestion
        to enable direct-io-mode as it says it should be disabled for
        better performance when used just for file accesses.

        It'd be great if someone from the Gluster team chimed in
        about this thread.


        Founder, Android Police <http://www.androidpolice.com>, APK
        Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
        beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
        <https://plus.google.com/+ArtemRussakovskii> | @ArtemR

        On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov
        <vladk...@gmail.com <mailto:vladk...@gmail.com>> wrote:

            Wish I knew or was able to get detailed description of
            those options myself.
            here is direct-io-mode
            Same as you I ran tests on a large volume of files,
            finding that main delays are in attribute calls, ending
            up with those mount options to add performance.
            I discovered those options through basically googling
            this user list with people sharing their tests.
            Not sure I would share your optimism, and rather then
            going up I downgraded to 3.12 and have no dir view issue
            now. Though I had to recreate the cluster and had to
            re-add bricks with existing data.

            On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii
            <archon...@gmail.com <mailto:archon...@gmail.com>> wrote:

                Hi Vlad,

                I'm using only localhost: mounts.

                Can you please explain what effect each option has on
                performance issues shown in my posts?
                From what I remember, direct-io-mode=enable didn't
                make a difference in my tests, but I suppose I can
                try again. The explanations about direct-io-mode are
                quite confusing on the web in various guides, saying
                enabling it could make performance worse in some
                situations and better in others due to OS file cache.

                There are also these gluster volume settings, adding
                to the confusion:
                Option: performance.strict-o-direct
                Default Value: off
                Description: This option when set to off, ignores the
                O_DIRECT flag.

                Option: performance.nfs.strict-o-direct
                Default Value: off
                Description: This option when set to off, ignores the
                O_DIRECT flag.

                Re: 4.0. I moved to 4.0 after finding out that it
                fixes the disappearing dirs bug related to
                cluster.readdir-optimize if you remember
                I was already on 3.13 by then, and 4.0 resolved the
                issue. It's been stable for me so far, thankfully.


                Founder, Android Police
                <http://www.androidpolice.com>, APK Mirror
                <http://www.apkmirror.com/>, Illogical Robot LLC
                beerpla.net <http://beerpla.net/> |
                <https://plus.google.com/+ArtemRussakovskii> |
                @ArtemR <http://twitter.com/ArtemR>

                On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov
                <vladk...@gmail.com <mailto:vladk...@gmail.com>> wrote:

                    you definitely need mount options to /etc/fstab
                    use ones from here

                    I went on with using local mounts to achieve
                    performance as well

                    Also, 3.12 or 3.10 branches would be preferable
                    for production

                    On Fri, Apr 6, 2018 at 4:12 AM, Artem
                    Russakovskii <archon...@gmail.com
                    <mailto:archon...@gmail.com>> wrote:

                        Hi again,

                        I'd like to expand on the performance issues
                        and plead for help. Here's one case which
                        shows these odd hiccups:

                        In this GIF where I switch back and forth
                        between copy operations on 2 servers, I'm
                        copying a 10GB dir full of .apk and image files.

                        On server "hive" I'm copying straight from
                        the main disk to an attached volume block
                        (xfs). As you can see, the transfers are
                        relatively speedy and don't hiccup.
                        On server "citadel" I'm copying the same set
                        of data to a 4-replicate gluster which uses
                        block storage as a brick. As you can see,
                        performance is much worse, and there are
                        frequent pauses for many seconds where
                        nothing seems to be happening - just freezes.

                        All 4 servers have the same specs, and all of
                        them have performance issues with gluster and
                        no such issues when raw xfs block storage is

                        hive has long finished copying the data,
                        while citadel is barely chugging along and is
                        expected to take probably half an hour to an
                        hour. I have over 1TB of data to migrate, at
                        which point if we went live, I'm not even
                        sure gluster would be able to keep up instead
                        of bringing the machines and services down.

                        Here's the cluster config, though it didn't
                        seem to make any difference performance-wise
                        before I applied the customizations vs after.

                        Volume Name: apkmirror_data1
                        Type: Replicate
                        Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
                        Status: Started
                        Snapshot Count: 0
                        Number of Bricks: 1 x 4 = 4
                        Transport-type: tcp
                        Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
                        Brick2: forge:/mnt/forge_block1/apkmirror_data1
                        Brick3: hive:/mnt/hive_block1/apkmirror_data1
                        Options Reconfigured:
                        cluster.quorum-count: 1
                        cluster.quorum-type: fixed
                        network.ping-timeout: 5
                        network.remote-dio: enable
                        performance.rda-cache-limit: 256MB
                        performance.readdir-ahead: on
                        performance.parallel-readdir: on
                        network.inode-lru-limit: 500000
                        performance.md-cache-timeout: 600
                        performance.cache-invalidation: on
                        performance.stat-prefetch: on
                        features.cache-invalidation-timeout: 600
                        features.cache-invalidation: on
                        cluster.readdir-optimize: on
                        performance.io-thread-count: 32
                        server.event-threads: 4
                        client.event-threads: 4
                        performance.read-ahead: off
                        cluster.lookup-optimize: on
                        performance.cache-size: 1GB
                        cluster.self-heal-daemon: enable
                        transport.address-family: inet
                        nfs.disable: on
                        performance.client-io-threads: on

                        The mounts are done as follows in /etc/fstab:
                        /mnt/citadel_block1 xfs defaults 0 2
                        /mnt/apkmirror_data1 glusterfs
                        defaults,_netdev 0 0

                        I'm really not sure if direct-io-mode mount
                        tweaks would do anything here, what the value
                        should be set to, and what it is by default.

                        The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM,
                        20 CPUs, hosted by Linode.

                        I'd really appreciate any help in the matter.

                        Thank you.


                        Founder, Android Police
                        <http://www.androidpolice.com>, APK Mirror
                        <http://www.apkmirror.com/>, Illogical Robot LLC
                        beerpla.net <http://beerpla.net/> |
                        | @ArtemR <http://twitter.com/ArtemR>

                        On Thu, Apr 5, 2018 at 11:13 PM, Artem
                        Russakovskii <archon...@gmail.com
                        <mailto:archon...@gmail.com>> wrote:


                            I'm trying to squeeze performance out of
                            gluster on 4 80GB RAM 20-CPU machines
                            where Gluster runs on attached block
                            storage (Linode) in (4 replicate bricks),
                            and so far everything I tried results in
                            sub-optimal performance.

                            There are many files - mostly images,
                            several million - and many operations
                            take minutes, copying multiple files
                            (even if they're small) suddenly freezes
                            up for seconds at a time, then continues,
                            iostat frequently shows large r_await and
                            w_awaits with 100% utilization for the
                            attached block device, etc.

                            But anyway, there are many guides out
                            there for small-file performance
                            improvements, but more explanation is
                            needed, and I think more tweaks should be

                            My question today is
                            about performance.cache-size. Is this a
                            size of cache in RAM? If so, how do I
                            view the current cache size to see if it
                            gets full and I should increase its size?
                            Is it advisable to bump it up if I have
                            many tens of gigs of RAM free?

                            More generally, in the last 2 months
                            since I first started working with
                            gluster and set a production system live,
                            I've been feeling frustrated because
                            Gluster has a lot of poorly-documented
                            and confusing options. I really wish
                            documentation could be improved with
                            examples and better explanations.

                            Specifically, it'd be absolutely amazing
                            if the docs offered a strategy for
                            setting each value and ways of
                            determining more optimal values. For
                            example, for performance.cache-size, if
                            it said something like "run command abc
                            to see your current cache size, and if
                            it's hurting, up it, but be aware that
                            it's limited by RAM," it'd be already a
                            huge improvement to the docs. And so on
                            with other options.

                            The gluster team is quite helpful on this
                            mailing list, but in a reactive rather
                            than proactive way. Perhaps it's tunnel
                            vision once you've worked on a project
                            for so long where less technical
                            explanations and even proper
                            documentation of options takes a back
                            seat, but I encourage you to be more
                            proactive about helping us understand and
                            optimize Gluster.

                            Thank you.


                            Founder, Android Police
                            <http://www.androidpolice.com>, APK
                            Mirror <http://www.apkmirror.com/>,
                            Illogical Robot LLC
                            beerpla.net <http://beerpla.net/> |
                            | @ArtemR <http://twitter.com/ArtemR>

                        Gluster-users mailing list

    Gluster-users mailing list
    Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>

Gluster-users mailing list

Reply via email to