Hi Pranith,

The "dd" command was:

    dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync

There were 2 instances where dd reported 22 seconds. The output from the dd tests are in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt

Pat

On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
Pat,
What is the command you used? As per the following output, it seems like at least one write operation took 16 seconds. Which is really bad.
      96.39    1165.10 us      89.00 us*16487014.00 us*          393212       
WRITE


On Tue, May 30, 2017 at 10:36 PM, Pat Haley <[email protected] <mailto:[email protected]>> wrote:


    Hi Pranith,

    I ran the same 'dd' test both in the gluster test volume and in
    the .glusterfs directory of each brick.  The median results (12 dd
    trials in each test) are similar to before

      * gluster test volume: 586.5 MB/s
      * bricks (in .glusterfs): 1.4 GB/s

    The profile for the gluster test-volume is in

    
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
    
<http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>

    Thanks

    Pat




    On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
    Let's start with the same 'dd' test we were testing with to see,
    what the numbers are. Please provide profile numbers for the
    same. From there on we will start tuning the volume to see what
    we can do.

    On Tue, May 30, 2017 at 9:16 PM, Pat Haley <[email protected]
    <mailto:[email protected]>> wrote:


        Hi Pranith,

        Thanks for the tip.  We now have the gluster volume mounted
        under /home.  What tests do you recommend we run?

        Thanks

        Pat



        On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:


        On Tue, May 16, 2017 at 9:20 PM, Pat Haley <[email protected]
        <mailto:[email protected]>> wrote:


            Hi Pranith,

            Sorry for the delay.  I never saw received your reply
            (but I did receive Ben Turner's follow-up to your
            reply).  So we tried to create a gluster volume under
            /home using different variations of

            gluster volume create test-volume
            mseas-data2:/home/gbrick_test_1
            mseas-data2:/home/gbrick_test_2 transport tcp

            However we keep getting errors of the form

            Wrong brick type: transport, use
            <HOSTNAME>:<export-dir-abs-path>

            Any thoughts on what we're doing wrong?


        You should give transport tcp at the beginning I think.
        Anyways, transport tcp is the default, so no need to specify
        so remove those two words from the CLI.


            Also do you have a list of the test we should be running
            once we get this volume created?  Given the time-zone
            difference it might help if we can run a small battery
            of tests and post the results rather than test-post-new
            test-post... .


        This is the first time I am doing performance analysis on
        users as far as I remember. In our team there are separate
        engineers who do these tests. Ben who replied earlier is one
        such engineer.

        Ben,
            Have any suggestions?


            Thanks

            Pat



            On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:


            On Thu, May 11, 2017 at 9:32 PM, Pat Haley
            <[email protected] <mailto:[email protected]>> wrote:


                Hi Pranith,

                The /home partition is mounted as ext4
                /home ext4 defaults,usrquota,grpquota   1 2

                The brick partitions are mounted ax xfs
                /mnt/brick1 xfs defaults 0 0
                /mnt/brick2 xfs defaults 0 0

                Will this cause a problem with creating a volume
                under /home?


            I don't think the bottleneck is disk. You can do the
            same tests you did on your new volume to confirm?


                Pat



                On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:


                On Thu, May 11, 2017 at 8:57 PM, Pat Haley
                <[email protected] <mailto:[email protected]>> wrote:


                    Hi Pranith,

                    Unfortunately, we don't have similar hardware
                    for a small scale test.  All we have is our
                    production hardware.


                You said something about /home partition which has
                lesser disks, we can create plain distribute
                volume inside one of those directories. After we
                are done, we can remove the setup. What do you say?


                    Pat




                    On 05/11/2017 07:05 AM, Pranith Kumar
                    Karampuri wrote:


                    On Thu, May 11, 2017 at 2:48 AM, Pat Haley
                    <[email protected] <mailto:[email protected]>> wrote:


                        Hi Pranith,

                        Since we are mounting the partitions as
                        the bricks, I tried the dd test writing
                        to
                        <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
                        The results without oflag=sync were 1.6
                        Gb/s (faster than gluster but not as fast
                        as I was expecting given the 1.2 Gb/s to
                        the no-gluster area w/ fewer disks).


                    Okay, then 1.6Gb/s is what we need to target
                    for, considering your volume is just
                    distribute. Is there any way you can do tests
                    on similar hardware but at a small scale?
                    Just so we can run the workload to learn more
                    about the bottlenecks in the system? We can
                    probably try to get the speed to 1.2Gb/s on
                    your /home partition you were telling me
                    yesterday. Let me know if that is something
                    you are okay to do.


                        Pat



                        On 05/10/2017 01:27 PM, Pranith Kumar
                        Karampuri wrote:


                        On Wed, May 10, 2017 at 10:15 PM, Pat
                        Haley <[email protected]
                        <mailto:[email protected]>> wrote:


                            Hi Pranith,

                            Not entirely sure (this isn't my
                            area of expertise). I'll run your
                            answer by some other people who are
                            more familiar with this.

                            I am also uncertain about how to
                            interpret the results when we also
                            add the dd tests writing to the
                            /home area (no gluster, still on the
                            same machine)

                              * dd test without oflag=sync
                                (rough average of multiple tests)
                                  o gluster w/ fuse mount : 570 Mb/s
                                  o gluster w/ nfs mount: 390 Mb/s
                                  o nfs (no gluster):  1.2 Gb/s
                              * dd test with oflag=sync (rough
                                average of multiple tests)
                                  o gluster w/ fuse mount:  5 Mb/s
                                  o gluster w/ nfs mount: 200 Mb/s
                                  o nfs (no gluster): 20 Mb/s

                            Given that the non-gluster area is a
                            RAID-6 of 4 disks while each brick
                            of the gluster area is a RAID-6 of
                            32 disks, I would naively expect the
                            writes to the gluster area to be
                            roughly 8x faster than to the
                            non-gluster.


                        I think a better test is to try and
                        write to a file using nfs without any
                        gluster to a location that is not inside
                        the brick but someother location that is
                        on same disk(s). If you are mounting the
                        partition as the brick, then we can
                        write to a file inside .glusterfs
                        directory, something like
                        <brick-path>/.glusterfs/<file-to-be-removed-after-test>.



                            I still think we have a speed issue,
                            I can't tell if fuse vs nfs is part
                            of the problem.


                        I got interested in the post because I
                        read that fuse speed is lesser than nfs
                        speed which is counter-intuitive to my
                        understanding. So wanted clarifications.
                        Now that I got my clarifications where
                        fuse outperformed nfs without sync, we
                        can resume testing as described above
                        and try to find what it is. Based on
                        your email-id I am guessing you are from
                        Boston and I am from Bangalore so if you
                        are okay with doing this debugging for
                        multiple days because of timezones, I
                        will be happy to help. Please be a bit
                        patient with me, I am under a release
                        crunch but I am very curious with the
                        problem you posted.

                            Was there anything useful in the
                            profiles?


                        Unfortunately profiles didn't help me
                        much, I think we are collecting the
                        profiles from an active volume, so it
                        has a lot of information that is not
                        pertaining to dd so it is difficult to
                        find the contributions of dd. So I went
                        through your post again and found
                        something I didn't pay much attention to
                        earlier i.e. oflag=sync, so did my own
                        tests on my setup with FUSE so sent that
                        reply.


                            Pat



                            On 05/10/2017 12:15 PM, Pranith
                            Kumar Karampuri wrote:
                            Okay good. At least this validates
                            my doubts. Handling O_SYNC in
                            gluster NFS and fuse is a bit
                            different.
                            When application opens a file with
                            O_SYNC on fuse mount then each
                            write syscall has to be written to
                            disk as part of the syscall where
                            as in case of NFS, there is no
                            concept of open. NFS performs write
                            though a handle saying it needs to
                            be a synchronous write, so write()
                            syscall is performed first then it
                            performs fsync(). so an write on an
                            fd with O_SYNC becomes write+fsync.
                            I am suspecting that when multiple
                            threads do this write+fsync()
                            operation on the same file,
                            multiple writes are batched
                            together to be written do disk so
                            the throughput on the disk is
                            increasing is my guess.

                            Does it answer your doubts?

                            On Wed, May 10, 2017 at 9:35 PM,
                            Pat Haley <[email protected]
                            <mailto:[email protected]>> wrote:


                                Without the oflag=sync and only
                                a single test of each, the FUSE
                                is going faster than NFS:

                                FUSE:
                                mseas-data2(dri_nascar)% dd
                                if=/dev/zero count=4096
                                bs=1048576 of=zeros.txt conv=sync
                                4096+0 records in
                                4096+0 records out
                                4294967296 bytes (4.3 GB)
                                copied, 7.46961 s, 575 MB/s


                                NFS
                                mseas-data2(HYCOM)% dd
                                if=/dev/zero count=4096
                                bs=1048576 of=zeros.txt conv=sync
                                4096+0 records in
                                4096+0 records out
                                4294967296 bytes (4.3 GB)
                                copied, 11.4264 s, 376 MB/s



                                On 05/10/2017 11:53 AM, Pranith
                                Kumar Karampuri wrote:
                                Could you let me know the
                                speed without oflag=sync on
                                both the mounts? No need to
                                collect profiles.

                                On Wed, May 10, 2017 at 9:17
                                PM, Pat Haley <[email protected]
                                <mailto:[email protected]>> wrote:


                                    Here is what I see now:

                                    [root@mseas-data2 ~]#
                                    gluster volume info

                                    Volume Name: data-volume
                                    Type: Distribute
                                    Volume ID:
                                    c162161e-2a2d-4dac-b015-f31fd89ceb18
                                    Status: Started
                                    Number of Bricks: 2
                                    Transport-type: tcp
                                    Bricks:
                                    Brick1:
                                    mseas-data2:/mnt/brick1
                                    Brick2:
                                    mseas-data2:/mnt/brick2
                                    Options Reconfigured:
                                    diagnostics.count-fop-hits: on
                                    diagnostics.latency-measurement:
                                    on
                                    nfs.exports-auth-enable: on
                                    diagnostics.brick-sys-log-level:
                                    WARNING
                                    performance.readdir-ahead: on
                                    nfs.disable: on
                                    nfs.export-volumes: off



                                    On 05/10/2017 11:44 AM,
                                    Pranith Kumar Karampuri wrote:
                                    Is this the volume info
                                    you have?

                                    >/[root at mseas-data2
                                    
<http://www.gluster.org/mailman/listinfo/gluster-users>
                                    ~]# gluster volume info />//>/Volume Name: 
data-volume />/Type: Distribute />/Volume ID:
                                    c162161e-2a2d-4dac-b015-f31fd89ceb18
                                    />/Status: Started />/Number of Bricks: 2 
/>/Transport-type: tcp />/Bricks: />/Brick1:
                                    mseas-data2:/mnt/brick1 />/Brick2:
                                    mseas-data2:/mnt/brick2 />/Options 
Reconfigured: />/performance.readdir-ahead:
                                    on />/nfs.disable: on 
/>/nfs.export-volumes: off /
                                    ​I copied this from old
                                    thread from 2016. This is
                                    distribute volume. Did
                                    you change any of the
                                    options in between?

--
                                    
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                                    Pat Haley                          
Email:[email protected]
                                    <mailto:[email protected]>
                                    Center for Ocean Engineering       Phone:  
(617) 253-6824
                                    Dept. of Mechanical Engineering    Fax:    
(617) 253-8125
                                    MIT, Room 
5-213http://web.mit.edu/phaley/www/
                                    77 Massachusetts Avenue
                                    Cambridge, MA  02139-4301

-- Pranith

--
                                
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                                Pat Haley                          
Email:[email protected]
                                <mailto:[email protected]>
                                Center for Ocean Engineering       Phone:  
(617) 253-6824
                                Dept. of Mechanical Engineering    Fax:    
(617) 253-8125
                                MIT, Room 5-213http://web.mit.edu/phaley/www/
                                77 Massachusetts Avenue
                                Cambridge, MA  02139-4301

-- Pranith

--
                            
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                            Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
                            Center for Ocean Engineering       Phone:  (617) 
253-6824
                            Dept. of Mechanical Engineering    Fax:    (617) 
253-8125
                            MIT, Room 5-213http://web.mit.edu/phaley/www/
                            77 Massachusetts Avenue
                            Cambridge, MA  02139-4301

-- Pranith

--
                        
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                        Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
                        Center for Ocean Engineering       Phone:  (617) 
253-6824
                        Dept. of Mechanical Engineering    Fax:    (617) 
253-8125
                        MIT, Room 5-213http://web.mit.edu/phaley/www/
                        77 Massachusetts Avenue
                        Cambridge, MA  02139-4301

-- Pranith

--
                    
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                    Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
                    Center for Ocean Engineering       Phone:  (617) 253-6824
                    Dept. of Mechanical Engineering    Fax:    (617) 253-8125
                    MIT, Room 5-213http://web.mit.edu/phaley/www/
                    77 Massachusetts Avenue
                    Cambridge, MA  02139-4301

-- Pranith

--
                
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
                Center for Ocean Engineering       Phone:  (617) 253-6824
                Dept. of Mechanical Engineering    Fax:    (617) 253-8125
                MIT, Room 5-213http://web.mit.edu/phaley/www/
                77 Massachusetts Avenue
                Cambridge, MA  02139-4301

-- Pranith

--
            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
            Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
            Center for Ocean Engineering       Phone:  (617) 253-6824
            Dept. of Mechanical Engineering    Fax:    (617) 253-8125
            MIT, Room 5-213http://web.mit.edu/phaley/www/
            77 Massachusetts Avenue
            Cambridge, MA  02139-4301




-- Pranith

--
        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
        Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
        Center for Ocean Engineering       Phone:  (617) 253-6824
        Dept. of Mechanical Engineering    Fax:    (617) 253-8125
        MIT, Room 5-213http://web.mit.edu/phaley/www/
        77 Massachusetts Avenue
        Cambridge, MA  02139-4301




-- Pranith

--
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Pat Haley                          Email:[email protected] 
<mailto:[email protected]>
    Center for Ocean Engineering       Phone:  (617) 253-6824
    Dept. of Mechanical Engineering    Fax:    (617) 253-8125
    MIT, Room 5-213http://web.mit.edu/phaley/www/
    77 Massachusetts Avenue
    Cambridge, MA  02139-4301




--
Pranith

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  [email protected]
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to