We were running 2.5.3.90 with changelogs enabled earlier this summer. We ran
into a catalog corruption issue (LU-6556) - we decided to deregister our
changelog users, move the CONFIGS/changelog_{catalog,users} files out of the
way, and carry on until we had an opportunity to upgrade. We did not remove
anything from /O/1/d* at that time (though we probably should have).
We've observed that mounting our MDT can take several-to-many minutes - I can
see with iostat that the MDT is very busy with reads while it is being mounted.
I suspect that those stale files in /O/1/d* are the reason (there are lots of
them), as they are processed by the OSP sync at MDT startup. I looked with
debugfs at the /O/1/d* directories - there are 1000s of files and their
timestamps are consistent with when we were using changelogs. I dumped a few
randomly selected ones and checked with llog_reader that the types of records
they contain are CHANGELOG_REC (type=10660000).
At the least, I think we should to remove the files in /O/1/d* that contain
CHANGELOG_REC entries. Can I just delete every file in /O/1/d*, or do I need
to be careful and only remove the CHANGELOG_REC entries?
The reason I ask is that I do see a handful of files that are not
changelog-related in these directories - their timestamps are newer and their
record type as reported by llog_reader is not CHANGELOG_REC or CHANGELOG_USER.
There are only a small number of such files, though.
Thanks,
Craig Prescott
University of Florida Research Computing
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org