Thanks, Cory.  We are still running 2.5.3.90, which doesn't have that fix.  
That patch looks like it would solve our slow-to-mount MDT.  FWIW, I don't 
think we have many (any?) empty plain llogs, but the removal of the 
llog_process_or_fork() call in  llog_cat_init_and_process() looks like it 
addresses our issue - I see that in the stack of the osp-syn-* threads when the 
MDT is being read like crazy during mounts.


As a followup - is there any reason *not* to unmount the MDT, mount it as 
ldiskfs, and simply delete the plain llogs in our MDT's O/1/d* folders that 
contain only CHANGELOG_REC records?   Or even every file under the MDT's O/1/d* 
folders?  I'm a little unsure.  It seems that most of (if not all of) the files 
there now are just taking up space, and nothing else is going to remove them.

FWIW, our intent is to start using changelogs and robinhood again after we 
upgrade to a later version of Lustre than what we are currently running, at 
which time we'll just start over - register new changelog users and rescan the 
whole filesystem.  We won't care about any prior history.


Thanks again,

Craig


________________________________
From: Cory Spitz <[email protected]>
Sent: Monday, December 5, 2016 5:30 PM
To: Prescott,Craig P; [email protected]
Subject: Re: [lustre-discuss] Changelog record cleanup in /O/1/d*

Craig, FWIW, this sounds a lot like https://jira.hpdd.intel.com/browse/LU-5038, 
which was addressed in 2.7.0.
-Cory

--


From: lustre-discuss <[email protected]> on behalf of 
"Prescott,Craig P" <[email protected]>
Date: Monday, December 5, 2016 at 3:02 PM
To: "[email protected]" <[email protected]>
Subject: [lustre-discuss] Changelog record cleanup in /O/1/d*




We were running 2.5.3.90 with changelogs enabled earlier this summer.  We ran 
into a catalog corruption issue (LU-6556) - we decided to deregister our 
changelog users, move the CONFIGS/changelog_{catalog,users} files out of the 
way, and carry on until we had an opportunity to upgrade.  We did not remove 
anything from /O/1/d* at that time (though we probably should have).



We've observed that mounting our MDT can take several-to-many minutes - I can 
see with iostat that the MDT is very busy with reads while it is being mounted. 
 I suspect that those stale files in /O/1/d* are the reason (there are lots of 
them), as they are processed by the OSP sync at MDT startup.   I looked with 
debugfs at the /O/1/d* directories - there are 1000s of files and their 
timestamps are consistent with when we were using changelogs.  I dumped a few 
randomly selected ones and checked with llog_reader that the types of records 
they contain are CHANGELOG_REC (type=10660000).



At the least, I think we should to remove the files in /O/1/d* that contain 
CHANGELOG_REC entries.  Can I just delete every file in /O/1/d*, or do I need 
to be careful and only remove the CHANGELOG_REC entries?



The reason I ask is that I do see a handful of files that are not 
changelog-related in these directories - their timestamps are newer and their 
record type as reported by llog_reader is not CHANGELOG_REC or CHANGELOG_USER.  
There are only a small number of such files, though.



Thanks,

Craig Prescott
University of Florida Research Computing
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to