Re: [Gluster-users] cascading errors (GlusterFS v3.7.x) - what's new about my issues?

Geoffrey Letessier Sun, 16 Aug 2015 01:43:20 -0700

In addition,

In my last mail, I forgot to speak about some additional matters:
        - in my logs, I also note some lines as followings:
[2015-07-31 22:13:00.574361] I [MSGID: 114047] 
[client-handshake.c:1225:client_setvolume_cbk] 0-vol_home-client-7: Server and 
Client lk-version numbers are not same, reopening the fds
[2015-07-31 22:13:00.574507] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 0-vol_home-client-7: Server 
lk version = 1


        - at least with v3.7.2 (i didn’t test it with v3.7.3) the failover 
mechanism seems to not be operational. Previously, when a storage node was 
down, all read/write operations was still functional thanks to an integrated 
(in glusterfs) failover mechanism. At least with 3.7.2, when a brick or a node 
was down, all operations hang (read, write, mount, etc.) even if the volume 
source is not located on the down server/brick…

I wonder if some of these previously mentioned issues (in particular those 
concerning quotas) dont take their origin in the quota crash I met with the 
previous version we used (v3.5.3) and where some files has been kept (caches, 
db or…) which didn’t be destroyed/replaced when I've deleted all volumes and 
upgraded GlusterFS from 3.5.3 to 3.7.2. 

That is a lot of bugs/issues for a production environment, isn’t it? Our 
scientific local production is stopped since roughly 7 weeks because of that.

Thanks by advance for your help.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

Le 16 août 2015 à 01:35, Geoffrey Letessier <[email protected]> a 
écrit :

> Hi,
> 
> Since i've upgraded GlusterFS from 3.5.3 to 3.7.x, trying to solve my quota 
> miscalculation and poor performances (as advised by the user support team), 
> we are still out-of-production for roughly 7 weeks because of a lot of v3.7.x 
> issues we meet:
> 
>       - T-files apparition. I notice a lot of T files (with permissions --- 
> --- --- T) located in my brick paths. Vijay has explained me T-files appear 
> when a re-name is performed or when an add/remove brick is performed; but the 
> problem is, since I've completely re-created (with RAID initialization, etc.) 
> and import my data into the new volume, i've renamed nothing and never add 
> nor delete any brick. 
> So, why these T-files are present in my new volume??? For example, for my 
> /derreumaux_team directory,  I have 13891 real files and 704 T-files 
> totalized in the brick paths…
> How to clean it, avoiding side effets?
> 
> The first time I noticed this kind of files, it was after having set a quota 
> under the real path size which has resulted in some quota explosions (quota 
> daemon failure) and T-files apparitions...
> 
>       - 7006 files in split-brain status after having back transfert data 
> (30TB, 6.2M files) from a backup server into my just created volume. Thanks 
> to Mathieu Chateau who help me putting me on road (GFID vs real file path), 
> this problem has been manually fixed.
> 
>       - log issue. After having created only one file (35GB), I can notice 
> more than 186000 new lines in brick log files. I can stop them setting 
> brick-log-level to CRITICAL but I guess this issue gravely impact the IO 
> performances and throughput. Vijay told me having fixed this problem in the 
> code but I apparently need to wait the new release to take advantage of… Very 
> nice for the production!
> 
> Actually, if I dont set brick-log-level to CRITICAL, i can fill my /var 
> partition (10GB) in less than 1 day making some tests/benchs in the volume… 
> 
>       - volume healing issue: slightly less than 14000 files was in a bad 
> situation (# gluster volume heal <vol_home> info) and a new forced heal in my 
> volume make no change. Thanks to Krutika and Pranith, this is problem is 
> currently fixed.
> 
>       - du/df/stat/etc. hangs cause of RDMA protocol. This problem seems to 
> not occur anymore since I’ve upgraded my GlusterFS v3.7.2 to v3.7.3. This was 
> probably due to the brick crashes (after a few minutes or a few days after 
> having [re]start the volume) with RDMA transport-type we had. I noticed it 
> only with v3.7.2 version.
> 
>       - quota problem: after having forced (with success) the quota 
> re-calculation (with a simple DU for each defined quotas), after a couple of 
> days with good values, the quota daemon failed again (some quota explosions, 
> etc.)
> 
>       - a lot of warnings in TAR operations on replicated volumes: 
> tar: linux-4.1-rc6/sound/soc/codecs/wm8962.c : fichier modifié pendant sa 
> lecture
> 
> 
>       - low I/O performances and throughput:
> 
>               1- if I enable to quota feature, my IO throughput is divided by 
> 2. So, for the moment, i disabled this feature… (only since I’ve upgraded 
> GlusterFS into 3.7.x version)
>               2- since I’ve upgraded GlusterFS from 3.5.3 to 3.7.3, my I/O 
> performance and throughput is lower than before, as you can read below. 
> (keeping in mind i’ve disable quota feature)
> 
> IO operation tests with a Linux kernel archive (80MB tar ball file, ~53000 
> files, 550MB uncompressed):
> ------------------------------------------------------------------------
> |                          PRODUCTION HARDWARE                         |
> ------------------------------------------------------------------------
> |             |  UNTAR  |   DU   |  FIND   |  GREP  |   TAR   |   RM   |
> ------------------------------------------------------------------------
> | native FS   |    ~16s |   ~0.1s |  ~0.1s |  ~0.1s |    ~24s |    ~3s |
> ------------------------------------------------------------------------
> |                        GlusterFS version 3.5.3                     |
> ------------------------------------------------------------------------
> | distributed |  ~2m57s |   ~23s |    ~22s |   ~49s |    ~50s |   ~54s |
> ------------------------------------------------------------------------
> | dist-repl   | ~29m56s |  ~1m5s |  ~1m04s | ~1m32s |  ~1m31s | ~2m40s |
> ------------------------------------------------------------------------
> |                        GlusterFS version 3.7.3                     |
> ------------------------------------------------------------------------
> | distributed |  ~2m49s |   ~20s |    ~29s |   ~58s |    ~60s |   ~41s |
> ------------------------------------------------------------------------
> | dist-repl   | ~28m24s |   ~51s |    ~37s | ~1m16s |  ~1m14s | ~1m17s |
> ------------------------------------------------------------------------
> *:
>       - distributed: 4 bricks (2 bricks on 2 servers)
>       - dist-repl: 4 bricks (2 bricks on 2 servers) for each replicas, 2 
> replicas.
>       - native FS: each brick path (XFS)
> 
> And the craziest thing is  I did the same test on a crashtest storage cluster 
> (2 old Dell servers, all brick are single 2TB hard drive 7.2k, 2 bricks for 
> each server) and the performance exceeds the production hardware performance 
> (4 recent servers, 2 bricks each, each brick is 24TB RAID6 with good LSI RAID 
> controllers (1 controller for 1 brick):
> ------------------------------------------------------------------------
> |                           CRASHTEST HARDWARE                         |
> ------------------------------------------------------------------------
> |             |  UNTAR  |   DU   |  FIND   |   GREP |   TAR   |   RM   |
> ------------------------------------------------------------------------
> | native FS   |    ~19s |   ~0.2s |  ~0.1s |  ~1.2s |    ~29s |    ~2s |
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> | single      |  ~3m45s |   ~43s |    ~47s |        |  ~3m10s | ~3m15s |
> ------------------------------------------------------------------------
> | single v2*  |  ~3m24s |   ~13s |    ~33s | ~1m10s |    ~46s |   ~48s |
> ------------------------------------------------------------------------
> | single NFS  | ~23m51s |    ~3s |     ~1s |   ~27s |    ~36s |   ~13s |
> ------------------------------------------------------------------------
> | replicated  |  ~5m10s |   ~59s |   ~1m6s |        |  ~1m19s | ~1m49s |
> ------------------------------------------------------------------------
> | distributed |  ~4m18s |   ~41s |    ~57s |        |  ~2m24s | ~1m38s |
> ------------------------------------------------------------------------
> | dist-repl   |   ~7m1s |   ~19s |    ~31s | ~1m34s |  ~1m26s | ~2m11s |
> ------------------------------------------------------------------------
> | FhGFS(dist) |  ~3m33s |   ~15s |     ~2s | ~1m31s |  ~1m31s |   ~52s |
> ------------------------------------------------------------------------
> *: with default parameters
> 
> 
> Concerning the throughput (for both writes and reads operations), in the 
> production hardware, it was around 600MBs (dist-repl volume) and 1.1GBs 
> (distributed volume) with GlusterFS version 3.5.3 with TCP network 
> transport-type (RDMA never worked in my storage cluster before 3.7.x version 
> of GlusterFS).
> Now, it is around 500-600MBs with RDMA and 150-300MBs with TCP (for dist-repl 
> volume), and around 600-700MBs with RDMA and 500-600 with TCP for distributed 
> volume.
> 
> Could you help to back into production our HPC center, solving 
> above-mentioned issues? Or do you advise me to downgrade into v3.5.3 (the 
> more stable version I’d known since I use GlusterFS in production)? or move 
> on ?;-)
> 
> Thanks in advance.
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] cascading errors (GlusterFS v3.7.x) - what's new about my issues?

Reply via email to