Hi Ben, Can you tell me more about? creation and setting steps, etc. Thanks in advance. Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: [email protected]
Le 8 juin 2015 à 18:22, Ben Turner <[email protected]> a écrit : > ----- Original Message ----- >> From: "Geoffrey Letessier" <[email protected]> >> To: "Ben Turner" <[email protected]> >> Cc: "Pranith Kumar Karampuri" <[email protected]>, >> [email protected] >> Sent: Monday, June 8, 2015 8:37:08 AM >> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances >> >> Hello, >> >> Do you know more about? >> >> In addition, do you know how to « activate » RDMA for my volume with >> Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type >> option (both in server and client side) but I notice all streams are using >> TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s). > > That is a little slow for the HW you described. Can you check what you get > with iperf just between the clients and servers? https://iperf.fr/ With > replica 2 and 10G NW you should see ~400 MB / sec sequential writes and ~600 > MB / sec reads. Can you send me the output from gluster v info? You specify > RDMA volumes at create time by running gluster v create blah transport rdma, > did you specify RDMA when you created the volume? What block size are you > using in your tests? 1024 KB writes perform best with glusterfs, and the > block size gets smaller perf will drop a little bit. I wouldn't write in > anything under 4k blocks, the sweet spot is between 64k and 1024k. > > -b > >> >> Thanks in advance, >> Geoffrey >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ingénieur système >> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: [email protected] >> >>> Le 2 juin 2015 à 23:45, Geoffrey Letessier <[email protected]> a >>> écrit : >>> >>> Hi Ben, >>> >>> I just check my messages log files, both on client and server, and I dont >>> find any hung task you notice on yours.. >>> >>> As you can read below, i dont note the performance issue in a simple DD but >>> I think my issue is concerning a set of small files (tens of thousands nay >>> more)… >>> >>> [root@nisus test]# ddt -t 10g /mnt/test/ >>> Writing to /mnt/test/ddt.8362 ... syncing ... done. >>> sleeping 10 seconds ... done. >>> Reading from /mnt/test/ddt.8362 ... done. >>> 10240MiB KiB/s CPU% >>> Write 114770 4 >>> Read 40675 4 >>> >>> for info: /mnt/test concerns the single v2 GlFS volume >>> >>> [root@nisus test]# ddt -t 10g /mnt/fhgfs/ >>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. >>> sleeping 10 seconds ... done. >>> Reading from /mnt/fhgfs/ddt.8380 ... done. >>> 10240MiB KiB/s CPU% >>> Write 102591 1 >>> Read 98079 2 >>> >>> Do you have a idea how to tune/optimize performance settings? and/or TCP >>> settings (MTU, etc.)? >>> >>> --------------------------------------------------------------- >>> | | UNTAR | DU | FIND | TAR | RM | >>> --------------------------------------------------------------- >>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>> --------------------------------------------------------------- >>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>> --------------------------------------------------------------- >>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>> --------------------------------------------------------------- >>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>> --------------------------------------------------------------- >>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>> --------------------------------------------------------------- >>> | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | >>> --------------------------------------------------------------- >>> | single (v2) | ~3m6s | ~14s | ~32s | ~1m2s | ~44s | >>> --------------------------------------------------------------- >>> for info: >>> -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 >>> servers) >>> - single (v2): simple gluster volume with default settings >>> >>> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS >>> but the rest (DU, FIND, RM) looks like to be OK. >>> >>> Thank you very much for your reply and help. >>> Geoffrey >>> ----------------------------------------------- >>> Geoffrey Letessier >>> >>> Responsable informatique & ingénieur système >>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique >>> Institut de Biologie Physico-Chimique >>> 13, rue Pierre et Marie Curie - 75005 Paris >>> Tel: 01 58 41 50 93 - eMail: [email protected] >>> <mailto:[email protected]> >>> Le 2 juin 2015 à 21:53, Ben Turner <[email protected] >>> <mailto:[email protected]>> a écrit : >>> >>>> I am seeing problems on 3.7 as well. Can you check /var/log/messages on >>>> both the clients and servers for hung tasks like: >>>> >>>> Jun 2 15:23:14 gqac006 kernel: "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> Jun 2 15:23:14 gqac006 kernel: iozone D 0000000000000001 0 >>>> 21999 1 0x00000080 >>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082 >>>> ffff880611321c18 ffffffffa027236e >>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10 >>>> ffff88052bd1e040 ffff880611321c78 >>>> Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0 >>>> ffff880625addaf8 ffff880611321fd8 >>>> Jun 2 15:23:14 gqac006 kernel: Call Trace: >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? >>>> rpc_make_runnable+0x7e/0x80 [sunrpc] >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? >>>> rpc_execute+0x50/0xa0 [sunrpc] >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? >>>> ktime_get_ts+0xb1/0xf0 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] >>>> __wait_on_bit+0x5f/0x90 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>] >>>> wait_on_page_bit+0x73/0x80 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? >>>> wake_bit_function+0x0/0x50 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? >>>> pagevec_lookup_tag+0x25/0x40 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] >>>> wait_on_page_writeback_range+0xfb/0x190 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] >>>> filemap_write_and_wait_range+0x78/0x90 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] >>>> vfs_fsync_range+0x7e/0x100 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20 >>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] >>>> system_call_fastpath+0x16/0x1b >>>> >>>> Do you see a perf problem with just a simple DD or do you need a more >>>> complex workload to hit the issue? I think I saw an issue with metadata >>>> performance that I am trying to run down, let me know if you can see the >>>> problem with simple DD reads / writes or if we need to do some sort of >>>> dir / metadata access as well. >>>> >>>> -b >>>> >>>> ----- Original Message ----- >>>>> From: "Geoffrey Letessier" <[email protected] >>>>> <mailto:[email protected]>> >>>>> To: "Pranith Kumar Karampuri" <[email protected] >>>>> <mailto:[email protected]>> >>>>> Cc: [email protected] <mailto:[email protected]> >>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM >>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances >>>>> >>>>> Hi Pranith, >>>>> >>>>> I’m sorry but I cannot bring you any comparison because comparison will >>>>> be >>>>> distorted by the fact in my HPC cluster in production the network >>>>> technology >>>>> is InfiniBand QDR and my volumes are quite different (brick in RAID6 >>>>> (12x2TB), 2 bricks per server and 4 servers into my pool) >>>>> >>>>> Concerning your demand, in attachments you can find all expected results >>>>> hoping it can help you to solve this serious performance issue (maybe I >>>>> need >>>>> play with glusterfs parameters?). >>>>> >>>>> Thank you very much by advance, >>>>> Geoffrey >>>>> ------------------------------------------------------ >>>>> Geoffrey Letessier >>>>> Responsable informatique & ingénieur système >>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>> <mailto:[email protected]> >>>>> >>>>> >>>>> >>>>> >>>>> Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri < [email protected] >>>>> <mailto:[email protected]> > a >>>>> écrit : >>>>> >>>>> hi Geoffrey, >>>>> Since you are saying it happens on all types of volumes, lets do the >>>>> following: >>>>> 1) Create a dist-repl volume >>>>> 2) Set the options etc you need. >>>>> 3) enable gluster volume profile using "gluster volume profile <volname> >>>>> start" >>>>> 4) run the work load >>>>> 5) give output of "gluster volume profile <volname> info" >>>>> >>>>> Repeat the steps above on new and old version you are comparing this >>>>> with. >>>>> That should give us insight into what could be causing the slowness. >>>>> >>>>> Pranith >>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: >>>>> >>>>> >>>>> Dear all, >>>>> >>>>> I have a crash test cluster where i’ve tested the new version of >>>>> GlusterFS >>>>> (v3.7) before upgrading my HPC cluster in production. >>>>> But… all my tests show me very very low performances. >>>>> >>>>> For my benches, as you can read below, I do some actions (untar, du, >>>>> find, >>>>> tar, rm) with linux kernel sources, dropping cache, each on distributed, >>>>> replicated, distributed-replicated, single (single brick) volumes and the >>>>> native FS of one brick. >>>>> >>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf >>>>> ~/linux-4.1-rc5.tar.xz; >>>>> sync; echo 3 > /proc/sys/vm/drop_caches) >>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 >>>>>> >>>>> /proc/sys/vm/drop_caches) >>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; >>>>> echo 3 >>>>>> /proc/sys/vm/drop_caches) >>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz >>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz >>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>> >>>>> And here are the process times: >>>>> >>>>> --------------------------------------------------------------- >>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>> --------------------------------------------------------------- >>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>> --------------------------------------------------------------- >>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>> --------------------------------------------------------------- >>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>> --------------------------------------------------------------- >>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>> --------------------------------------------------------------- >>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>> --------------------------------------------------------------- >>>>> >>>>> I get the same results, whether with default configurations with custom >>>>> configurations. >>>>> >>>>> if I look at the side of the ifstat command, I can note my IO write >>>>> processes >>>>> never exceed 3MBs... >>>>> >>>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS >>>>> one >>>>> >>>>> My [test] storage cluster config is composed by 2 identical servers >>>>> (biCPU >>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) >>>>> >>>>> My volume settings: >>>>> single: 1server 1 brick >>>>> replicated: 2 servers 1 brick each >>>>> distributed: 2 servers 2 bricks each >>>>> dist-repl: 2 bricks in the same server and replica 2 >>>>> >>>>> All seems to be OK in gluster status command line. >>>>> >>>>> Do you have an idea why I obtain so bad results? >>>>> Thanks in advance. >>>>> Geoffrey >>>>> ----------------------------------------------- >>>>> Geoffrey Letessier >>>>> >>>>> Responsable informatique & ingénieur système >>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>> <mailto:[email protected]> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list [email protected] >>>>> <mailto:[email protected]> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >> >> _______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
