Re: [Gluster-users] Quota issue

Vijaikumar M Tue, 09 Jun 2015 01:05:52 -0700


On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote:

Hi,

Yes of course:
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -s/export/brick_home/brick*/amyloid_team
cl-storage1: 1608522280/export/brick_home/brick1/amyloid_team
cl-storage3: 1619630616/export/brick_home/brick1/amyloid_team
cl-storage1: 1614057836/export/brick_home/brick2/amyloid_team
cl-storage3: 1602653808/export/brick_home/brick2/amyloid_team
The sum is: 6444864540 (around 6.4-6.5TB) while the quota listdisplays 7.7TB.So, the mistake is roughly 1.2-1.3TB, in other words around 16% -whichis too huge, no?
In addition, since the quota is exceeded, i note a lot of files likefollowing:[root@lucifer ~]# pdsh -w cl-storage[1,3] "cd/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/;ls -ail remd_100.sh 2> /dev/null" 2>/dev/nullcl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 févr.10:20 remd_100.sh
note the ’T’ at the end of perms and the file size to 0B.

And, yesterday, some files were duplicated but not anymore...
The worst is, previously, all these files were OK. In other words,exceeding quota made file or content deletions or corruptions… Whatcan I do to prevent to situation for the futur -because I guess icannot do something to rollback this situation now, right?


Hi Geoffrey,

I tried re-creating the problem.

Here is the behaviour of vi editor.

When a file is saved in vi editor, it creates a backup file under homedir and opens the original file with 'O_TRUNC' flag and hence file wastruncated.



Here is the strace of vi editor when it gets 'EDQUOT' error:

open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3
write(3, "line one\nline two\n", 18)    = 18
fsync(3)                                = 0
close(3)                                = -1 EDQUOT (Disk quota exceeded)
chmod("hello", 0100644)                 = 0
open("/root/hello~", O_RDONLY)          = 3
*open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7*
read(3, "line one\n", 256)              = 9
write(7, "line one\n", 9)               = 9
read(3, "", 256)                        = 0
close(7)                                = -1 EDQUOT (Disk quota exceeded)
close(3)                                = 0

To re-cover the truncated file, please find if there are any backup file'remd_115.sh~' under '~/' or on the same dir where this file exists.Ifexists you can copy this file.


Thanks,
Vijay

Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 9 juin 2015 à 09:01, Vijaikumar M <[email protected]<mailto:[email protected]>> a écrit :
On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote:
In addition, i notice a very big difference between the sum of DU oneach brick and « quota list » display, as you can read below:[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh/export/brick_home/brick*/amyloid_team
cl-storage1: 1,6T/export/brick_home/brick1/amyloid_team
cl-storage3: 1,6T/export/brick_home/brick1/amyloid_team
cl-storage1: 1,6T/export/brick_home/brick2/amyloid_team
cl-storage3: 1,6T/export/brick_home/brick2/amyloid_team
[root@lucifer ~]# gluster volume quota vol_home list /amyloid_team
          Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/amyloid_team                             9.0TB       90% 7.8TB   1.2TB
As you can notice, the sum of all bricks gives me roughly 6.4TB and« quota list » around 7.8TB; so there is a difference of 1.4TB i’mnot able to explain… Do you have any idea?
There were few issues when quota accounting the size, we have fixedsome of these issues in 3.7'df -h' will round off the values, can you please provide the outputof 'df' without -h option?
Thanks,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 8 juin 2015 à 14:30, Geoffrey Letessier<[email protected] <mailto:[email protected]>> aécrit :
Hello,
Concerning the 3.5.3 version of GlusterFS, I met this morning astrange issue writing file when quota is exceeded.
One person of my lab, whose her quota is exceeded (but she didn’tknow about) try to modify a file but, because of exceeded quota,she was unable to and decided to exit VI. Now, her file isempty/blank as you can read below:
we suspect 'vi' might have created tmp file before writing to a file.We are working on re-creating this problem and will update you on thesame.
pdsh@lucifer: cl-storage3: ssh exited with exit code 2
cl-storage1: ---------T 2 tarus amyloid_team 0 19 févr. 12:34/export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.shcl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
In addition, i dont understand why, my volume being a distributedvolume inside replica (cl-storage[1,3] is replicated only oncl-storage[2,4]), i have 2 « same » files (complete path) in 2different bricks (as you can read above).
Thanks by advance for your help and clarification.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 2 juin 2015 à 23:45, Geoffrey Letessier<[email protected] <mailto:[email protected]>> aécrit :
Hi Ben,
I just check my messages log files, both on client and server, andI dont find any hung task you notice on yours..
As you can read below, i dont note the performance issue in asimple DD but I think my issue is concerning a set of small files(tens of thousands nay more)…
[root@nisus test]# ddt -t 10g /mnt/test/
Writing to /mnt/test/ddt.8362 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /mnt/test/ddt.8362 ... done.
10240MiB   KiB/s  CPU%
Write   114770     4
Read   40675     4

for info: /mnt/test concerns the single v2 GlFS volume

[root@nisus test]# ddt -t 10g /mnt/fhgfs/
Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /mnt/fhgfs/ddt.8380 ... done.
10240MiB   KiB/s  CPU%
Write   102591     1
Read   98079     2
Do you have a idea how to tune/optimize performance settings?and/or TCP settings (MTU, etc.)?
---------------------------------------------------------------
| |  UNTAR  |   DU   |  FIND |   TAR   |   RM   |
---------------------------------------------------------------
| single |  ~3m45s |   ~43s |   ~47s |  ~3m10s | ~3m15s |
---------------------------------------------------------------
| replicated |  ~5m10s |   ~59s |  ~1m6s |  ~1m19s | ~1m49s |
---------------------------------------------------------------
| distributed |  ~4m18s |   ~41s |   ~57s |  ~2m24s | ~1m38s |
---------------------------------------------------------------
| dist-repl |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
---------------------------------------------------------------
| native FS |    ~11s |    ~4s |   ~2s |    ~56s |   ~10s |
---------------------------------------------------------------
| BeeGFS  |  ~3m43s |   ~15s |   ~3s |  ~1m33s |   ~46s |
---------------------------------------------------------------
| single (v2) |   ~3m6s |   ~14s |  ~32s |   ~1m2s |   ~44s |
---------------------------------------------------------------
for info:
-BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2servers)
- single (v2): simple gluster volume with default settings
I also note I obtain the same tar/untar performance issue withFhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK.
Thank you very much for your reply and help.
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 2 juin 2015 à 21:53, Ben Turner <[email protected]<mailto:[email protected]>> a écrit :
I am seeing problems on 3.7 as well. Can you check/var/log/messages on both the clients and servers for hung taskslike:
Jun 2 15:23:14 gqac006 kernel: "echo 0 >/proc/sys/kernel/hung_task_timeout_secs" disables this message.Jun 2 15:23:14 gqac006 kernel: iozone D 00000000000000010 21999 1 0x00000080Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082ffff880611321c18 ffffffffa027236eJun 2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10ffff88052bd1e040 ffff880611321c78Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0ffff880625addaf8 ffff880611321fd8
Jun  2 15:23:14 gqac006 kernel: Call Trace:
Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ?rpc_make_runnable+0x7e/0x80 [sunrpc]Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ?rpc_execute+0x50/0xa0 [sunrpc]Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ?ktime_get_ts+0xb1/0xf0Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ?sync_page+0x0/0x50Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>]io_schedule+0x73/0xc0Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>]sync_page+0x3d/0x50Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>]__wait_on_bit+0x5f/0x90Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>]wait_on_page_bit+0x73/0x80Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ?wake_bit_function+0x0/0x50Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ?pagevec_lookup_tag+0x25/0x40Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>]wait_on_page_writeback_range+0xfb/0x190Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>]filemap_write_and_wait_range+0x78/0x90Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>]vfs_fsync_range+0x7e/0x100Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>]vfs_fsync+0x1d/0x20Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>]do_fsync+0x3e/0x60Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>]sys_fsync+0x10/0x20Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>]system_call_fastpath+0x16/0x1b
Do you see a perf problem with just a simple DD or do you need amore complex workload to hit the issue? I think I saw an issuewith metadata performance that I am trying to run down, let meknow if you can see the problem with simple DD reads / writes orif we need to do some sort of dir / metadata access as well.
-b

----- Original Message -----
From: "Geoffrey Letessier" <[email protected]<mailto:[email protected]>>To: "Pranith Kumar Karampuri" <[email protected]<mailto:[email protected]>>
Cc:[email protected] <mailto:[email protected]>
Sent: Tuesday, June 2, 2015 8:09:04 AM
Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

Hi Pranith,
I’m sorry but I cannot bring you any comparison becausecomparison will bedistorted by the fact in my HPC cluster in production thenetwork technology
is InfiniBand QDR and my volumes are quite different (brick in RAID6
(12x2TB), 2 bricks per server and 4 servers into my pool)
Concerning your demand, in attachments you can find all expectedresultshoping it can help you to solve this serious performance issue(maybe I need
play with glusterfs parameters?).

Thank you very much by advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri <[email protected] <mailto:[email protected]> > a
écrit :

hi Geoffrey,
Since you are saying it happens on all types of volumes, lets do the
following:
1) Create a dist-repl volume
2) Set the options etc you need.
3) enable gluster volume profile using "gluster volume profile<volname>
start"
4) run the work load
5) give output of "gluster volume profile <volname> info"
Repeat the steps above on new and old version you are comparingthis with.
That should give us insight into what could be causing the slowness.

Pranith
On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:


Dear all,
I have a crash test cluster where i’ve tested the new version ofGlusterFS
(v3.7) before upgrading my HPC cluster in production.
But… all my tests show me very very low performances.
For my benches, as you can read below, I do some actions (untar,du, find,tar, rm) with linux kernel sources, dropping cache, each ondistributed,replicated, distributed-replicated, single (single brick)volumes and the
native FS of one brick.
# time (echo 3 > /proc/sys/vm/drop_caches; tar xJf~/linux-4.1-rc5.tar.xz;
sync; echo 3 > /proc/sys/vm/drop_caches)
# time (echo 3 > /proc/sys/vm/drop_caches; du -shlinux-4.1-rc5/; echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 > /proc/sys/vm/drop_caches; findlinux-4.1-rc5/|wc -l; echo 3
/proc/sys/vm/drop_caches)
# time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
# time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)

And here are the process times:

---------------------------------------------------------------
| | UNTAR | DU | FIND | TAR | RM |
---------------------------------------------------------------
| single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
---------------------------------------------------------------
| replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
---------------------------------------------------------------
| distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
---------------------------------------------------------------
| dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
---------------------------------------------------------------
| native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
---------------------------------------------------------------
I get the same results, whether with default configurations withcustom
configurations.
if I look at the side of the ifstat command, I can note my IOwrite processes
never exceed 3MBs...
EXT4 native FS seems to be faster (roughly 15-20% but no more)than XFS one
My [test] storage cluster config is composed by 2 identicalservers (biCPU
Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)

My volume settings:
single: 1server 1 brick
replicated: 2 servers 1 brick each
distributed: 2 servers 2 bricks each
dist-repl: 2 bricks in the same server and replica 2

All seems to be OK in gluster status command line.

Do you have an idea why I obtain so bad results?
Thanks in advance.
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
_______________________________________________
Gluster-users mailing list [email protected]<mailto:[email protected]>
http://www.gluster.org/mailman/listinfo/gluster-users




_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quota issue

Reply via email to