It is more of a cautious thing. The MDS/MGS kernel panicked a few times in as many days. The first couple were under heavy load due to a user. But when I was bringing it back up, I ran e2fsk on all the targets and had some corruption that was fixed. But then the MGS/MDS kernel panicked as soon as I mounted the MGT and MDT. Hadn't even mounted any OSTs. So to be careful, I have the filesystem offline and started running the e2fsck --mdsdb on the MDT It is writing to local disk, so the slowness shouldn't be due to that. It's even an SSD. It is pretty confusing that it is taking so long tho. I see one CPU that is pretty much pegged at >90% and the mdsdb file does grow, albeit very slowly (like 6 hours before a few bytes are written to it).
Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 -----Original Message----- From: Dilger, Andreas [mailto:[email protected]] Sent: Monday, May 26, 2014 12:15 PM To: Andrus, Brian Contractor Cc: [email protected] Subject: Re: [Lustre-discuss] E2fsck running for a week so far... On 2014/05/26, 9:03 AM, "Andrus, Brian Contractor" <[email protected]<mailto:[email protected]>> wrote: Is it normal for e2fsck running on an MDT with --msdb to take over a week? The entire MDT is only 500GB. This is limited by the performance of the database that e2fsck is using for the mdsdb. If this is stored on e.g. NFS, and the database is large, then it will slow to a crawl. Typically I don't recommend users to run the old lfsck unless there is a huge amount of corruption that needs to be fixed. Most of the problems it fixes can also be fixed in a different manner. What problem are you having? Cheers, Andreas So far it has only output: e2fsck 1.42.7.wc2 (07-Nov-2013) WORK=MDT0000 lustre database creation, check forced. Pass 1: Checking inodes, blocks, and sizes MDS: ost_idx 0 max_id 6351370 MDS: ost_idx 1 max_id 5766664 MDS: ost_idx 2 max_id 5821326 MDS: ost_idx 3 max_id 5720490 MDS: ost_idx 4 max_id 2889092 MDS: ost_idx 5 max_id 2654116 MDS: ost_idx 6 max_id 2805220 MDS: ost_idx 7 max_id 2895847 MDS: ost_idx 8 max_id 2932156 MDS: ost_idx 9 max_id 2777382 MDS: ost_idx 10 max_id 2764932 MDS: ost_idx 11 max_id 2655203 MDS: ost_idx 12 max_id 2742542 MDS: ost_idx 13 max_id 2856457 MDS: got 112 bytes = 14 entries in lov_objids MDS: max_files = 32837426 MDS: num_osts = 14 mds info db file written Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Pass 6: Acquiring MDT information for lfsck Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
