What does the system say when you try to mmchdisk blah blah 'resume or start' on 52? What does the /var/adm/ras/mmfs.log.latest say
Ed Wahl OSC ________________________________________ From: [email protected] [[email protected]] on behalf of Roman Baranowski [[email protected]] Sent: Monday, May 04, 2015 4:12 AM To: [email protected] Subject: [gpfsug-discuss] Failed NSD - help appreciated Dear All, First of all my apologies if this is not the appropriate place to pose such a post. However ..... I have a old IBM cluster with the GPFS version 3.2 mmlsconfig: clusterName Moraines.westgrid.ubc clusterType lc autoload yes minReleaseLevel 3.2.1.5 dmapiFileHandleSize 32 pagepool 128M [moraine9] pagepool 1536M [moraine1,moraine2,moraine3,moraine4,moraine5,moraine6,moraine7,moraine8] pagepool 2048M [common] dataStructureDump /var/tmp/mmfs maxFilesToCache 10000 File systems in cluster Moraines.westgrid.ubc: ---------------------------------------------- /dev/gpfs1 /dev/gpfs2 Some time ago we had suffered a few double disk failures on our SAN and /dev/gpfs1 cannot be mounted and the mmfsck on that fs fails with: Error accessing inode file. InodeProblemList: 4 entries iNum snapId status keep delete noScan new error -------------- ---------- ------ ---- ------ ------ --- ------------------ 0 0 3 0 0 0 1 0x10000010 AddrCorrupt IndblockBad 1 0 3 0 0 0 1 0x00000010 AddrCorrupt 2 0 3 0 0 0 1 0x00000010 AddrCorrupt 3 0 1 0 0 0 1 0x00000010 AddrCorrupt File system check has ended prematurely. Errors were encountered which could not be corrected. Exit status 22:2:26. mmfsck: Command failed. Examine previous error messages to determine cause. We have corrected all SAN disk failures. The "mmlsdisk gpfs1" disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ nsd_home_1T1 nsd 512 -1 yes yes ready up system nsd_home_2T1 nsd 512 -1 yes yes ready up system nsd_home_3T1 nsd 512 -1 yes yes ready up system nsd_home_4T1 nsd 512 -1 yes yes ready up system nsd_home_5T1 nsd 512 -1 yes yes ready up system nsd_home_1T2 nsd 512 -1 yes yes ready up system nsd_home_2T2 nsd 512 -1 yes yes ready up system nsd_home_3T2 nsd 512 -1 yes yes ready up system nsd_home_4T2 nsd 512 -1 yes yes ready unrecovered system nsd_home_5T2 nsd 512 -1 yes yes ready down system nsd_home_6T1 nsd 512 -1 yes yes ready up system nsd_home_7T1 nsd 512 -1 yes yes ready up system nsd_home_6T2 nsd 512 -1 yes yes ready up system nsd_home_7T2 nsd 512 -1 yes yes ready up system Attention: Due to an earlier configuration change the file system may contain data that is at risk of being lost. We are sure that nsd_home_4T2 is gone (double disk failure on the SAN) however nsd_home_5T2 (marked down) suffered a failure but using the SAN storage manager we were able to revive the array and it should contain valid and good data. However all attempts to star that nsd failed. We decided to remove the 'bad' nsd_home_4T2 with: mmdeldisk gpfs1 nsd_home_4T2 -p and mmdeldisk gpfs1 nsd_home_5T2 -c The current state is: disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ------- -------- ----- ------------- ------------ ------------ nsd_home_1T1 nsd 512 -1 yes yes ready up system nsd_home_2T1 nsd 512 -1 yes yes ready up system nsd_home_3T1 nsd 512 -1 yes yes ready up system nsd_home_4T1 nsd 512 -1 yes yes ready up system nsd_home_5T1 nsd 512 -1 yes yes ready up system nsd_home_1T2 nsd 512 -1 yes yes ready up system nsd_home_2T2 nsd 512 -1 yes yes ready up system nsd_home_3T2 nsd 512 -1 yes yes ready up system nsd_home_4T2 nsd 512 -1 yes yes allocmap delp down system nsd_home_5T2 nsd 512 -1 yes yes being emptied down system nsd_home_6T1 nsd 512 -1 yes yes ready up system nsd_home_7T1 nsd 512 -1 yes yes ready up system nsd_home_6T2 nsd 512 -1 yes yes ready up system nsd_home_7T2 nsd 512 -1 yes yes ready up system Attention: Due to an earlier configuration change the file system may contain data that is at risk of being lost. Note: the gpfs1 FS (home) was at the moment of failure ~90% full The question I am posing here (any help, suggestions are appreciated) is the following Is there anything we can do to recover some partial data without removing the fs1 and using the backup (some other long story and issues we currently addressing). We have some unused capacity (free nsds) (free disk) nsd_home_8T2 moraine1.westgrid.ubc,moraine2.westgrid.ubc which eventually can be used. All the best Roman _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
