Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

Geoffrey Letessier Mon, 08 Jun 2015 13:00:07 -0700

Hi Ben

Here the expected output:
[root@node048 ~]# iperf3 -c 10.0.4.1
Connecting to host 10.0.4.1, port 5201
[  4] local 10.0.5.48 port 44151 connected to 10.0.4.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.86 GBytes  15.9 Gbits/sec    0   8.24 MBytes       
[  4]   1.00-2.00   sec  1.94 GBytes  16.7 Gbits/sec    0   8.24 MBytes       
[  4]   2.00-3.00   sec  1.95 GBytes  16.8 Gbits/sec    0   8.24 MBytes       
[  4]   3.00-4.00   sec  1.86 GBytes  16.0 Gbits/sec    0   8.24 MBytes       
[  4]   4.00-5.00   sec  1.85 GBytes  15.8 Gbits/sec    0   8.24 MBytes       
[  4]   5.00-6.00   sec  1.89 GBytes  16.2 Gbits/sec    0   8.24 MBytes       
[  4]   6.00-7.00   sec  1.90 GBytes  16.3 Gbits/sec    0   8.24 MBytes       
[  4]   7.00-8.00   sec  1.88 GBytes  16.1 Gbits/sec    0   8.24 MBytes       
[  4]   8.00-9.00   sec  1.88 GBytes  16.2 Gbits/sec    0   8.24 MBytes       
[  4]   9.00-10.00  sec  1.87 GBytes  16.1 Gbits/sec    0   8.24 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  18.9 GBytes  16.2 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  18.9 GBytes  16.2 Gbits/sec                  receiver


iperf Done.

Here are all shell commands i used for volume creation with RDMA transport-type:
gluster volume create vol_home replica 2 transport rdma,tcp 
ib-storage1:/export/brick_home/brick1/ ib-storage2:/export/brick_home/brick1/ 
ib-storage3:/export/brick_home/brick1/ ib-storage4:/export/brick_home/brick1/ 
ib-storage1:/export/brick_home/brick2/ ib-storage2:/export/brick_home/brick2/ 
ib-storage3:/export/brick_home/brick2/ ib-storage4:/export/brick_home/brick2/ 
force

and below the current volume information:
[root@lucifer ~]# gluster volume info vol_home
 
Volume Name: vol_home
Type: Distributed-Replicate
Volume ID: f6ebcfc1-b735-4a0e-b1d7-47ed2d2e7af6
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp,rdma
Bricks:
Brick1: ib-storage1:/export/brick_home/brick1
Brick2: ib-storage2:/export/brick_home/brick1
Brick3: ib-storage3:/export/brick_home/brick1
Brick4: ib-storage4:/export/brick_home/brick1
Brick5: ib-storage1:/export/brick_home/brick2
Brick6: ib-storage2:/export/brick_home/brick2
Brick7: ib-storage3:/export/brick_home/brick2
Brick8: ib-storage4:/export/brick_home/brick2
Options Reconfigured:
performance.stat-prefetch: on
performance.flush-behind: on
features.default-soft-limit: 90%
features.quota: on
diagnostics.brick-log-level: CRITICAL
auth.allow: localhost,127.0.0.1,10.*
nfs.disable: on
performance.cache-size: 64MB
performance.write-behind-window-size: 1MB
performance.quick-read: on
performance.io-cache: on
performance.io-thread-count: 64
nfs.enable-ino32: on

and below my mount command:
mount -t glusterfs -o transport=rdma,direct-io-mode=disable,enable-ino32 
ib-storage1:vol_home /home

I dont obtain any error with RDMA option but transport type silently fall back 
to TCP.

Did i make any mistake in my settings?

Can you tell me more about block size and other tunings i should do on my rdma 
volumes?

Thanks in advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

Le 8 juin 2015 à 18:22, Ben Turner <[email protected]> a écrit :

> ----- Original Message -----
>> From: "Geoffrey Letessier" <[email protected]>
>> To: "Ben Turner" <[email protected]>
>> Cc: "Pranith Kumar Karampuri" <[email protected]>, 
>> [email protected]
>> Sent: Monday, June 8, 2015 8:37:08 AM
>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
>> 
>> Hello,
>> 
>> Do you know more about?
>> 
>> In addition, do you know how to « activate » RDMA for my volume with
>> Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type
>> option (both in server and client side) but I notice all streams are using
>> TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s).
> 
> That is a little slow for the HW you described.  Can you check what you get 
> with iperf just between the clients and servers? https://iperf.fr/  With 
> replica 2 and 10G NW you should see ~400 MB / sec sequential writes and ~600 
> MB / sec reads.  Can you send me the output from gluster v info?  You specify 
> RDMA volumes at create time by running gluster v create blah transport rdma, 
> did you specify RDMA when you created the volume?  What block size are you 
> using in your tests?  1024 KB writes perform best with glusterfs, and the 
> block size gets smaller perf will drop a little bit.  I wouldn't write in 
> anything under 4k blocks, the sweet spot is between 64k and 1024k.
> 
> -b
> 
>> 
>> Thanks in advance,
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: [email protected]
>> 
>>> Le 2 juin 2015 à 23:45, Geoffrey Letessier <[email protected]> a
>>> écrit :
>>> 
>>> Hi Ben,
>>> 
>>> I just check my messages log files, both on client and server, and I dont
>>> find any hung task you notice on yours..
>>> 
>>> As you can read below, i dont note the performance issue in a simple DD but
>>> I think my issue is concerning a set of small files (tens of thousands nay
>>> more)…
>>> 
>>> [root@nisus test]# ddt -t 10g /mnt/test/
>>> Writing to /mnt/test/ddt.8362 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /mnt/test/ddt.8362 ... done.
>>> 10240MiB    KiB/s  CPU%
>>> Write      114770     4
>>> Read        40675     4
>>> 
>>> for info: /mnt/test concerns the single v2 GlFS volume
>>> 
>>> [root@nisus test]# ddt -t 10g /mnt/fhgfs/
>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /mnt/fhgfs/ddt.8380 ... done.
>>> 10240MiB    KiB/s  CPU%
>>> Write      102591     1
>>> Read        98079     2
>>> 
>>> Do you have a idea how to tune/optimize performance settings? and/or TCP
>>> settings (MTU, etc.)?
>>> 
>>> ---------------------------------------------------------------
>>> |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
>>> ---------------------------------------------------------------
>>> | single      |  ~3m45s |   ~43s |    ~47s |  ~3m10s | ~3m15s |
>>> ---------------------------------------------------------------
>>> | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
>>> ---------------------------------------------------------------
>>> | distributed |  ~4m18s |   ~41s |    ~57s |  ~2m24s | ~1m38s |
>>> ---------------------------------------------------------------
>>> | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
>>> ---------------------------------------------------------------
>>> | native FS   |    ~11s |    ~4s |     ~2s |    ~56s |   ~10s |
>>> ---------------------------------------------------------------
>>> | BeeGFS      |  ~3m43s |   ~15s |     ~3s |  ~1m33s |   ~46s |
>>> ---------------------------------------------------------------
>>> | single (v2) |   ~3m6s |   ~14s |    ~32s |   ~1m2s |   ~44s |
>>> ---------------------------------------------------------------
>>> for info:
>>>     -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 
>>> servers)
>>>     - single (v2): simple gluster volume with default settings
>>> 
>>> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS
>>> but the rest (DU, FIND, RM) looks like to be OK.
>>> 
>>> Thank you very much for your reply and help.
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ingénieur système
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>> <mailto:[email protected]>
>>> Le 2 juin 2015 à 21:53, Ben Turner <[email protected]
>>> <mailto:[email protected]>> a écrit :
>>> 
>>>> I am seeing problems on 3.7 as well.  Can you check /var/log/messages on
>>>> both the clients and servers for hung tasks like:
>>>> 
>>>> Jun  2 15:23:14 gqac006 kernel: "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> Jun  2 15:23:14 gqac006 kernel: iozone        D 0000000000000001     0
>>>> 21999      1 0x00000080
>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082
>>>> ffff880611321c18 ffffffffa027236e
>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10
>>>> ffff88052bd1e040 ffff880611321c78
>>>> Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0
>>>> ffff880625addaf8 ffff880611321fd8
>>>> Jun  2 15:23:14 gqac006 kernel: Call Trace:
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ?
>>>> rpc_make_runnable+0x7e/0x80 [sunrpc]
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ?
>>>> rpc_execute+0x50/0xa0 [sunrpc]
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ?
>>>> ktime_get_ts+0xb1/0xf0
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>]
>>>> __wait_on_bit+0x5f/0x90
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>]
>>>> wait_on_page_bit+0x73/0x80
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ?
>>>> wake_bit_function+0x0/0x50
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ?
>>>> pagevec_lookup_tag+0x25/0x40
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>]
>>>> wait_on_page_writeback_range+0xfb/0x190
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>]
>>>> filemap_write_and_wait_range+0x78/0x90
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>]
>>>> vfs_fsync_range+0x7e/0x100
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20
>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>]
>>>> system_call_fastpath+0x16/0x1b
>>>> 
>>>> Do you see a perf problem with just a simple DD or do you need a more
>>>> complex workload to hit the issue?  I think I saw an issue with metadata
>>>> performance that I am trying to run down, let me know if you can see the
>>>> problem with simple DD reads / writes or if we need to do some sort of
>>>> dir / metadata access as well.
>>>> 
>>>> -b
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Geoffrey Letessier" <[email protected]
>>>>> <mailto:[email protected]>>
>>>>> To: "Pranith Kumar Karampuri" <[email protected]
>>>>> <mailto:[email protected]>>
>>>>> Cc: [email protected] <mailto:[email protected]>
>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM
>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
>>>>> 
>>>>> Hi Pranith,
>>>>> 
>>>>> I’m sorry but I cannot bring you any comparison because comparison will
>>>>> be
>>>>> distorted by the fact in my HPC cluster in production the network
>>>>> technology
>>>>> is InfiniBand QDR and my volumes are quite different (brick in RAID6
>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool)
>>>>> 
>>>>> Concerning your demand, in attachments you can find all expected results
>>>>> hoping it can help you to solve this serious performance issue (maybe I
>>>>> need
>>>>> play with glusterfs parameters?).
>>>>> 
>>>>> Thank you very much by advance,
>>>>> Geoffrey
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ingénieur système
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>> <mailto:[email protected]>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri < [email protected]
>>>>> <mailto:[email protected]> > a
>>>>> écrit :
>>>>> 
>>>>> hi Geoffrey,
>>>>> Since you are saying it happens on all types of volumes, lets do the
>>>>> following:
>>>>> 1) Create a dist-repl volume
>>>>> 2) Set the options etc you need.
>>>>> 3) enable gluster volume profile using "gluster volume profile <volname>
>>>>> start"
>>>>> 4) run the work load
>>>>> 5) give output of "gluster volume profile <volname> info"
>>>>> 
>>>>> Repeat the steps above on new and old version you are comparing this
>>>>> with.
>>>>> That should give us insight into what could be causing the slowness.
>>>>> 
>>>>> Pranith
>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
>>>>> 
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> I have a crash test cluster where i’ve tested the new version of
>>>>> GlusterFS
>>>>> (v3.7) before upgrading my HPC cluster in production.
>>>>> But… all my tests show me very very low performances.
>>>>> 
>>>>> For my benches, as you can read below, I do some actions (untar, du,
>>>>> find,
>>>>> tar, rm) with linux kernel sources, dropping cache, each on distributed,
>>>>> replicated, distributed-replicated, single (single brick) volumes and the
>>>>> native FS of one brick.
>>>>> 
>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf
>>>>> ~/linux-4.1-rc5.tar.xz;
>>>>> sync; echo 3 > /proc/sys/vm/drop_caches)
>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3
>>>>>> 
>>>>> /proc/sys/vm/drop_caches)
>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l;
>>>>> echo 3
>>>>>> /proc/sys/vm/drop_caches)
>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz
>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>> 
>>>>> And here are the process times:
>>>>> 
>>>>> ---------------------------------------------------------------
>>>>> | | UNTAR | DU | FIND | TAR | RM |
>>>>> ---------------------------------------------------------------
>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
>>>>> ---------------------------------------------------------------
>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
>>>>> ---------------------------------------------------------------
>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
>>>>> ---------------------------------------------------------------
>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
>>>>> ---------------------------------------------------------------
>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
>>>>> ---------------------------------------------------------------
>>>>> 
>>>>> I get the same results, whether with default configurations with custom
>>>>> configurations.
>>>>> 
>>>>> if I look at the side of the ifstat command, I can note my IO write
>>>>> processes
>>>>> never exceed 3MBs...
>>>>> 
>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS
>>>>> one
>>>>> 
>>>>> My [test] storage cluster config is composed by 2 identical servers
>>>>> (biCPU
>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
>>>>> 
>>>>> My volume settings:
>>>>> single: 1server 1 brick
>>>>> replicated: 2 servers 1 brick each
>>>>> distributed: 2 servers 2 bricks each
>>>>> dist-repl: 2 bricks in the same server and replica 2
>>>>> 
>>>>> All seems to be OK in gluster status command line.
>>>>> 
>>>>> Do you have an idea why I obtain so bad results?
>>>>> Thanks in advance.
>>>>> Geoffrey
>>>>> -----------------------------------------------
>>>>> Geoffrey Letessier
>>>>> 
>>>>> Responsable informatique & ingénieur système
>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>> <mailto:[email protected]>
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Gluster-users mailing list [email protected]
>>>>> <mailto:[email protected]>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

Reply via email to