Can you restart glusterd service (first check that it was not modified to kill the bricks)? Best Regards,Strahil Nikolov On Thu, Mar 16, 2023 at 8:26, Diego Zuccato<diego.zucc...@unibo.it> wrote: OOM is just just a matter of time.
Today mem use is up to 177G/187 and: # ps aux|grep glfsheal|wc -l 551 (well, one is actually the grep process, so "only" 550 glfsheal processes. I'll take the last 5: root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07 /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml -8<-- root@str957-clustor00:~# ps -o ppid= 3266352 3266345 root@str957-clustor00:~# ps -o ppid= 3267220 3267213 root@str957-clustor00:~# ps -o ppid= 3268076 3268069 root@str957-clustor00:~# ps -o ppid= 3269492 3269485 root@str957-clustor00:~# ps -o ppid= 3270354 3270347 root@str957-clustor00:~# ps aux|grep 3266345 root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00 gluster volume heal cluster_data info summary --xml root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00 grep 3266345 root@str957-clustor00:~# ps aux|grep 3267213 root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00 gluster volume heal cluster_data info summary --xml root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3267213 root@str957-clustor00:~# ps aux|grep 3268069 root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00 gluster volume heal cluster_data info summary --xml root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00 grep 3268069 root@str957-clustor00:~# ps aux|grep 3269485 root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00 gluster volume heal cluster_data info summary --xml root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00 grep 3269485 root@str957-clustor00:~# ps aux|grep 3270347 root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00 gluster volume heal cluster_data info summary --xml root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00 grep 3270347 -8<-- Seems glfsheal is spawning more processes. I can't rule out a metadata corruption (or at least a desync), but it shouldn't happen... Diego Il 15/03/2023 20:11, Strahil Nikolov ha scritto: > If you don't experience any OOM , you can focus on the heals. > > 284 processes of glfsheal seems odd. > > Can you check the ppid for 2-3 randomly picked ? > ps -o ppid= <pid> > > Best Regards, > Strahil Nikolov > > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato > <diego.zucc...@unibo.it> wrote: > I enabled it yesterday and that greatly reduced memory pressure. > Current volume info: > -8<-- > Volume Name: cluster_data > Type: Distributed-Replicate > Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a > Status: Started > Snapshot Count: 0 > Number of Bricks: 45 x (2 + 1) = 135 > Transport-type: tcp > Bricks: > Brick1: clustor00:/srv/bricks/00/d > Brick2: clustor01:/srv/bricks/00/d > Brick3: clustor02:/srv/bricks/00/q (arbiter) > [...] > Brick133: clustor01:/srv/bricks/29/d > Brick134: clustor02:/srv/bricks/29/d > Brick135: clustor00:/srv/bricks/14/q (arbiter) > Options Reconfigured: > performance.quick-read: off > cluster.entry-self-heal: on > cluster.data-self-heal-algorithm: full > cluster.metadata-self-heal: on > cluster.shd-max-threads: 2 > network.inode-lru-limit: 500000 > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > features.quota-deem-statfs: on > performance.readdir-ahead: on > cluster.granular-entry-heal: enable > features.scrub: Active > features.bitrot: on > cluster.lookup-optimize: on > performance.stat-prefetch: on > performance.cache-refresh-timeout: 60 > performance.parallel-readdir: on > performance.write-behind-window-size: 128MB > cluster.self-heal-daemon: enable > features.inode-quota: on > features.quota: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > client.event-threads: 1 > features.scrub-throttle: normal > diagnostics.brick-log-level: ERROR > diagnostics.client-log-level: ERROR > config.brick-threads: 0 > cluster.lookup-unhashed: on > config.client-threads: 1 > cluster.use-anonymous-inode: off > diagnostics.brick-sys-log-level: CRITICAL > features.scrub-freq: monthly > cluster.data-self-heal: on > cluster.brick-multiplex: on > cluster.daemon-log-level: ERROR > -8<-- > > htop reports that memory usage is up to 143G, there are 602 tasks and > 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on > clustor01 and 126G/45 tasks/1574 threads on clustor02. > I see quite a lot (284!) of glfsheal processes running on clustor00 (a > "gluster v heal cluster_data info summary" is running on clustor02 > since > yesterday, still no output). Shouldn't be just one per brick? > > Diego > > Il 15/03/2023 08:30, Strahil Nikolov ha scritto: > > Do you use brick multiplexing ? > > > > Best Regards, > > Strahil Nikolov > > > > On Tue, Mar 14, 2023 at 16:44, Diego Zuccato > > <diego.zucc...@unibo.it <mailto:diego.zucc...@unibo.it>> wrote: > > Hello all. > > > > Our Gluster 9.6 cluster is showing increasing problems. > > Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 > cores dual > > thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 > [12TB]), > > configured in replica 3 arbiter 1. Using Debian packages from > Gluster > > 9.x latest repository. > > > > Seems 192G RAM are not enough to handle 30 data bricks + 15 > arbiters > > and > > I often had to reload glusterfsd because glusterfs processed > got killed > > for OOM. > > On top of that, performance have been quite bad, especially > when we > > reached about 20M files. On top of that, one of the servers > have had > > mobo issues that resulted in memory errors that corrupted some > > bricks fs > > (XFS, it required "xfs_reparir -L" to fix). > > Now I'm getting lots of "stale file handle" errors and other > errors > > (like directories that seem empty from the client but still > containing > > files in some bricks) and auto healing seems unable to complete. > > > > Since I can't keep up continuing to manually fix all the > issues, I'm > > thinking about backup+destroy+recreate strategy. > > > > I think that if I reduce the number of bricks per server to just 5 > > (RAID1 of 6x12TB disks) I might resolve RAM issues - at the > cost of > > longer heal times in case a disk fails. Am I right or it's > useless? > > Other recommendations? > > Servers have space for another 6 disks. Maybe those could be > used for > > some SSDs to speed up access? > > > > TIA. > > > > -- > > Diego Zuccato > > DIFA - Dip. di Fisica e Astronomia > > Servizi Informatici > > Alma Mater Studiorum - Università di Bologna > > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > > tel.: +39 051 20 95786 > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk> > > <https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk>> > > Gluster-users mailing list > > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> > <mailto:Gluster-users@gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.gluster.org/mailman/listinfo/gluster-users> > > <https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.gluster.org/mailman/listinfo/gluster-users>> > > > > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > <https://meet.google.com/cpu-eiue-hvk> > Gluster-users mailing list > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > <https://lists.gluster.org/mailman/listinfo/gluster-users> > -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users