Hi Srijan, After a 3rd run of the quota_fsck script, the quotas got fixed! Working normally again.
Thank you for your help! *João Baúto* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Brasília, Doca de Pedrouços 1400-038 Lisbon, Portugal fchampalimaud.org <https://www.fchampalimaud.org/> Srijan Sivakumar <ssiva...@redhat.com> escreveu no dia quarta, 19/08/2020 à(s) 18:04: > Hi João, > > I'd recommend to go with the disable/enable of the quota as that'd > eventually do the same thing. Rather than manually changing the parameters > in the said command, that would be the better option. > > -- > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > <https://www.redhat.com> > > > <https://www.redhat.com> > > T: +91-9727532362 <http://redhatemailsignature-marketing.itos.redhat.com/> > > <https://red.ht/sig> > TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> > > On Wed, Aug 19, 2020 at 8:12 PM João Baúto < > joao.ba...@neuro.fchampalimaud.org> wrote: > >> Hi Srijan, >> >> Before I do the disable/enable just want to check something with you. The >> other cluster where the crawling is running, I can see the find command and >> this one which seems to be the one triggering the crawler (4 processes, one >> per brick in all nodes) >> >> /usr/sbin/glusterfs -s localhost --volfile-id >> client_per_brick/tank.client.hostname.tank-volume1-brick.vol >> --use-readdirp=yes --client-pid -100 -l >> /var/log/glusterfs/quota_crawl/tank-volume1-brick.log >> /var/run/gluster/tmp/mntYbIVwT >> >> Can I manually trigger this command? >> >> Thanks! >> *João Baúto* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Brasília, Doca de Pedrouços >> 1400-038 Lisbon, Portugal >> fchampalimaud.org <https://www.fchampalimaud.org/> >> >> >> Srijan Sivakumar <ssiva...@redhat.com> escreveu no dia quarta, >> 19/08/2020 à(s) 07:25: >> >>> Hi João, >>> >>> If the crawl is not going on and the values are still not reflecting >>> properly then it means the crawl process has ended abruptly. >>> >>> Yes, technically disabling and enabling the quota will trigger crawl but >>> it'd do a complete crawl of the filesystem, hence would take time and be >>> resource consuming. Usually disabling-enabling is the last thing to do if >>> the accounting isn't reflecting properly but if you're going to merge these >>> two clusters then probably you can go ahead with the merging and then >>> enable quota. >>> >>> -- >>> Thanks and Regards, >>> >>> SRIJAN SIVAKUMAR >>> >>> Associate Software Engineer >>> >>> Red Hat >>> <https://www.redhat.com> >>> >>> >>> <https://www.redhat.com> >>> >>> T: +91-9727532362 >>> <http://redhatemailsignature-marketing.itos.redhat.com/> >>> <https://red.ht/sig> >>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> >>> >>> On Wed, Aug 19, 2020 at 3:53 AM João Baúto < >>> joao.ba...@neuro.fchampalimaud.org> wrote: >>> >>>> Hi Srijan, >>>> >>>> I didn't get any result with that command so I went to our other >>>> cluster (we are merging two clusters, data is replicated) and activated the >>>> quota feature on the same directory. Running the same command on each node >>>> I get a similar output to yours. One process per brick I'm assuming. >>>> >>>> root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> >>>> At this point, is it easier to just disable and enable the feature and >>>> force a new crawl? We don't mind a temporary increase in CPU and IO usage. >>>> >>>> Thank you again! >>>> *João Baúto* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Brasília, Doca de Pedrouços >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>> >>>> >>>> Srijan Sivakumar <ssiva...@redhat.com> escreveu no dia terça, >>>> 18/08/2020 à(s) 21:42: >>>> >>>>> Hi João, >>>>> >>>>> There isn't a straightforward way of tracking the crawl but as gluster >>>>> uses find and stat during crawl, one can run the following command, >>>>> # ps aux | grep find >>>>> >>>>> If the output is of the form, >>>>> "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 >>>>> /usr/bin/find . -exec /usr/bin/stat {} \" >>>>> then it means that the crawl is still going on. >>>>> >>>>> >>>>> Thanks and Regards, >>>>> >>>>> SRIJAN SIVAKUMAR >>>>> >>>>> Associate Software Engineer >>>>> >>>>> Red Hat >>>>> <https://www.redhat.com> >>>>> >>>>> >>>>> <https://www.redhat.com> >>>>> >>>>> T: +91-9727532362 >>>>> <http://redhatemailsignature-marketing.itos.redhat.com/> >>>>> <https://red.ht/sig> >>>>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> >>>>> >>>>> >>>>> On Wed, Aug 19, 2020 at 1:46 AM João Baúto < >>>>> joao.ba...@neuro.fchampalimaud.org> wrote: >>>>> >>>>>> Hi Srijan, >>>>>> >>>>>> Is there a way of getting the status of the crawl process? >>>>>> We are going to expand this cluster, adding 12 new bricks (around >>>>>> 500TB) and we rely heavily on the quota feature to control the space >>>>>> usage >>>>>> for each project. It's been running since Saturday (nothing changed) >>>>>> and unsure if it's going to finish tomorrow or in weeks. >>>>>> >>>>>> Thank you! >>>>>> *João Baúto* >>>>>> --------------- >>>>>> >>>>>> *Scientific Computing and Software Platform* >>>>>> Champalimaud Research >>>>>> Champalimaud Center for the Unknown >>>>>> Av. Brasília, Doca de Pedrouços >>>>>> 1400-038 Lisbon, Portugal >>>>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>>>> >>>>>> >>>>>> Srijan Sivakumar <ssiva...@redhat.com> escreveu no dia domingo, >>>>>> 16/08/2020 à(s) 06:11: >>>>>> >>>>>>> Hi João, >>>>>>> >>>>>>> Yes it'll take some time given the file system size as it has to >>>>>>> change the xattrs in each level and then crawl upwards. >>>>>>> >>>>>>> stat is done by the script itself so the crawl is initiated. >>>>>>> >>>>>>> Regards, >>>>>>> Srijan Sivakumar >>>>>>> >>>>>>> On Sun 16 Aug, 2020, 04:58 João Baúto, < >>>>>>> joao.ba...@neuro.fchampalimaud.org> wrote: >>>>>>> >>>>>>>> Hi Srijan & Strahil, >>>>>>>> >>>>>>>> I ran the quota_fsck script mentioned in Hari's blog post in all >>>>>>>> bricks and it detected a lot of size mismatch. >>>>>>>> >>>>>>>> The script was executed as, >>>>>>>> >>>>>>>> - python quota_fsck.py --sub-dir projectB --fix-issues >>>>>>>> /mnt/tank /tank/volume2/brick (in all nodes and bricks) >>>>>>>> >>>>>>>> Here is a snippet from the script, >>>>>>>> >>>>>>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>>>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>>>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>>>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', >>>>>>>> 'file_count': >>>>>>>> 18446744073035296610L, 'dirty': False, 'dir_count': >>>>>>>> 18446744073709527653L, >>>>>>>> 'size': 18446645297413872640L} 15204281691754 >>>>>>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>>>>>> stat on /mnt/tank/projectB >>>>>>>> Files verified : 683223 >>>>>>>> Directories verified : 46823 >>>>>>>> Objects Fixed : 705230 >>>>>>>> >>>>>>>> Checking the xattr in the bricks I can see the directory in >>>>>>>> question marked as dirty, >>>>>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>>> # file: tank/volume2/brick/projectB >>>>>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>> >>>>>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>>>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>> >>>>>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>> >>>>>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>>> trusted.glusterfs.quota.dirty=0x3100 >>>>>>>> >>>>>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>> >>>>>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>>> >>>>>>>> Now, my question is how do I trigger Gluster to recalculate the >>>>>>>> quota for this directory? Is it automatic but it takes a while? >>>>>>>> Because the >>>>>>>> quota list did change but not to a good "result". >>>>>>>> >>>>>>>> Path Hard-limit Soft-limit Used >>>>>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>>>>>> No No >>>>>>>> >>>>>>>> I would like to avoid a disable/enable quota in the volume as it >>>>>>>> removes the configs. >>>>>>>> >>>>>>>> Thank you for all the help! >>>>>>>> *João Baúto* >>>>>>>> --------------- >>>>>>>> >>>>>>>> *Scientific Computing and Software Platform* >>>>>>>> Champalimaud Research >>>>>>>> Champalimaud Center for the Unknown >>>>>>>> Av. Brasília, Doca de Pedrouços >>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>>>>>> >>>>>>>> >>>>>>>> Srijan Sivakumar <ssiva...@redhat.com> escreveu no dia sábado, >>>>>>>> 15/08/2020 à(s) 11:57: >>>>>>>> >>>>>>>>> Hi João, >>>>>>>>> >>>>>>>>> The quota accounting error is what we're looking at here. I think >>>>>>>>> you've already looked into the blog post by Hari and are using the >>>>>>>>> script >>>>>>>>> to fix the accounting issue. >>>>>>>>> That should help you out in fixing this issue. >>>>>>>>> >>>>>>>>> Let me know if you face any issues while using it. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Srijan Sivakumar >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri 14 Aug, 2020, 17:10 João Baúto, < >>>>>>>>> joao.ba...@neuro.fchampalimaud.org> wrote: >>>>>>>>> >>>>>>>>>> Hi Strahil, >>>>>>>>>> >>>>>>>>>> I have tried removing the quota for that specific directory and >>>>>>>>>> setting it again but it didn't work (maybe it has to be a quota >>>>>>>>>> disable and >>>>>>>>>> enable in the volume options). Currently testing a solution >>>>>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>>>>>> detecting a lot of size mismatch in files. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> *João Baúto* >>>>>>>>>> --------------- >>>>>>>>>> >>>>>>>>>> *Scientific Computing and Software Platform* >>>>>>>>>> Champalimaud Research >>>>>>>>>> Champalimaud Center for the Unknown >>>>>>>>>> Av. Brasília, Doca de Pedrouços >>>>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Strahil Nikolov <hunter86...@yahoo.com> escreveu no dia sexta, >>>>>>>>>> 14/08/2020 à(s) 10:16: >>>>>>>>>> >>>>>>>>>>> Hi João, >>>>>>>>>>> >>>>>>>>>>> Based on your output it seems that the quota size is different >>>>>>>>>>> on the 2 bricks. >>>>>>>>>>> >>>>>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe >>>>>>>>>>> it will be the easiest way to fix it. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Strahil Nikolov >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> На 14 август 2020 г. 4:35:14 GMT+03:00, "João Baúto" < >>>>>>>>>>> joao.ba...@neuro.fchampalimaud.org> написа: >>>>>>>>>>> >Hi all, >>>>>>>>>>> > >>>>>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node >>>>>>>>>>> running >>>>>>>>>>> >Gluster >>>>>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>>>>>> >members on >>>>>>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>>>>>> reported >>>>>>>>>>> >by >>>>>>>>>>> >Gluster in the quota list. >>>>>>>>>>> > >>>>>>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>>>>>> > >>>>>>>>>>> > Path Hard-limit Soft-limit Used >>>>>>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>>>>>> 1.9TB >>>>>>>>>>> > No No >>>>>>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB >>>>>>>>>>> 740.9TB >>>>>>>>>>> > No No* >>>>>>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>>>>>> 20.0TB >>>>>>>>>>> > No No >>>>>>>>>>> > >>>>>>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>>>>>> >projectB >>>>>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and >>>>>>>>>>> 740TB >>>>>>>>>>> >available (already decreased from 750TB). >>>>>>>>>>> > >>>>>>>>>>> >There was an issue in Gluster 3.x related to the wrong >>>>>>>>>>> directory quota >>>>>>>>>>> >( >>>>>>>>>>> > >>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>>>>>> > and >>>>>>>>>>> > >>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>>>>>> ) >>>>>>>>>>> >but it's marked as solved (not sure if the solution still >>>>>>>>>>> applies). >>>>>>>>>>> > >>>>>>>>>>> >*On projectB* >>>>>>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>>>>>> ># file: projectB >>>>>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>>> > >>>>>>>>>>> >*On projectA* >>>>>>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>>>>>> ># file: projectA >>>>>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>>> > >>>>>>>>>>> >Any idea on what's happening and how to fix it? >>>>>>>>>>> > >>>>>>>>>>> >Thanks! >>>>>>>>>>> >*João Baúto* >>>>>>>>>>> >--------------- >>>>>>>>>>> > >>>>>>>>>>> >*Scientific Computing and Software Platform* >>>>>>>>>>> >Champalimaud Research >>>>>>>>>>> >Champalimaud Center for the Unknown >>>>>>>>>>> >Av. Brasília, Doca de Pedrouços >>>>>>>>>>> >1400-038 Lisbon, Portugal >>>>>>>>>>> >fchampalimaud.org <https://www.fchampalimaud.org/> >>>>>>>>>>> >>>>>>>>>> ________ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Community Meeting Calendar: >>>>>>>>>> >>>>>>>>>> Schedule - >>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>>>>> >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users@gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>> >>>>> >>>>> -- >>>>> >>>> >>> >>> > > -- > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > <https://www.redhat.com> > > > <https://www.redhat.com> > > T: +91-9727532362 <http://redhatemailsignature-marketing.itos.redhat.com/> > > <https://red.ht/sig> > TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users