Hi Alexander,
Before I recieved this reply, I deregistered the cl1 user. It took a very long
time, and I am not sure if it successfully finished or not since the server
crashed once the next morning.
Then, I moved the old changelog_catalog file, and created a zero
changelog_user file instead.
This is what I got from the old changelog_catalog file.
# ls -l /tmp/changelog.dmp
-rw-r--r-- 1 root root 4153280 Dec 6 06:54 /tmp/changelog.dmp
# llog_reader changelog.dmp |grep "type=1064553b" |wc -l
63432
This number is smaller than 64768, I am not sure if it is related to the
unfinished deregisteration or not.
The first record number is 1, the last record number of is 64767. I think there
maybe some skipped record numbers:
# llog_reader changelog.dmp |grep "type=1064553b" |head -n 1
rec #1 type=1064553b len=64
# llog_reader changelog.dmp |grep "type=1064553b" |tail -n 1
rec #64767 type=1064553b len=64
# llog_reader changelog.dmp |grep "^rec" | grep -v "type=1064553b"
return 0 lines.
By the way, are the llog files you mentioned virtual or real? if they are real,
where are they located? Need I clean them manually ?
Thanks,
Lu,Wang
From: Alexander Boyko
Date: 2015-12-04 21:36
To: wanglu; lustre-discuss
Subject: RE [lustre-discuss] No free catalog slots for log ( Lustre 2.5.3 &
Robinhood 2.5.3 )
Here are 4 questions which we cannot find answers in LU-1586:
1. According to Andres?s reply, there should some unconsumed changelog
files on our MDT, and these files have taken all the space (file quotas?)
Lustre gives to changelog. With Lustre 2.1, these files are under OBJECTS
directory and can be listed in ldiskfs mode. In our case, with Lustre 2.5.3,
there is no OBJECTS directory can be found. In this case, how can we monitor
the situation before the unconsumed changelogs takes up all the disk space?
The changelog base on one catalog file and a plain llog files. Catalog stores
limited number of records about 64768. A catalog record size is 64 byte. Each
record has information about plain llog file. A plain llog file stores records
about IO operation. A number of records at the plain llog file is about 64768
with different record size. So changelog could store 64768^2 IO operations and
it occupy filesystem space. The error "no free catalog slots" is happened when
changelog catalog doesn`t have a slot to store a record about new plain lllog.
All slots are filled or internal changelog markers became crazy and internal
logic don`t work.
To be closer to the root cause, you need to dump a changelog catalog and check
bitmap. Is there free slots? Something like
debugfs -R "dump changelog_catalog changelog_catalog.dmp" /dev/md55 &&
used=`llog_reader changelog_catalog.dmp | grep "type=1064553b" | wc -l`
2. Why there are so many unconsumed changelogs? Could it related to our
frequent remount of MDT( abort_recovery mode )?
umount operation create half empty plain llog file. And changelog_clear can`t
remove it, if all slots is freed. Only new mount can remove that file. It could
be related or not.
3. When we remount the MDT, robinhood is still running. Why robinhood can not
consume those old changelogs after MDT service is recovered?
4. Why there is a huge difference between current index(4199610352 ) and
cl1(49035933) index?
Thank you for your time and help !
Wang,Lu
--
Alexander Boyko
Seagate
www.seagate.com_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org