What devices are underneath dm-21 and are there any errors in /var/log/messages for those devices? (assuming /dev/sdX devices underneath)
Run `ls /sys/block/dm-21/slaves` to see what devices are beneath dm-21 On Tue, Jul 6, 2021 at 20:09 David Cohen <cda...@physics.technion.ac.il> wrote: > Hi, > The index of the OST is unique in the system and free for the new one, as > it is increased by "1" for every new OST created, so whatever it converts > to should not be relevant to it's refusal to mount, or am I mistaken? > > I'm pasting the log messages again, in case they were lost up the thread, > adding the output of "fdisk -l", should the OST size be the issue: > > lctl dk show tens of thousands of lines repeating the same error after > attempting to mount the OST: > > 00100000:10000000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) > local-OST0033: fail to set LMA for init OI scrub: rc = -30 > 00100000:10000000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) > local-OST0033: fail to set LMA for init OI scrub: rc = -30 > 00100000:10000000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) > local-OST0033: fail to set LMA for init OI scrub: rc = -30 > > in /var/log/messages I see the following corresponding to dm21 which is > the new OST: > > Jul 6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21): > ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, > please wait. > Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled, > maximum tree depth=5 > Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): > ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous > mount: IO failure > Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): > ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check. > Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs > with errors, running e2fsck is recommended > Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete > Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with > ordered data mode. Opts: > user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc > Jul 6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21): > htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad > entry in directory: rec_len is too small for name_len - offset=4084(4084), > inode=0, rec_len=12 > , name_len=0 > Jul 6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8. > Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem > read-only > Jul 6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21): > kmmpd:187: kmmpd being stopped since filesystem has been remounted as > readonly. > Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last > fsck: 6 > Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time > 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233 > Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time > 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233 > > fdisk -l /dev/mapper/OST0051 > > Disk /dev/mapper/OST0051: 142799.1 GB, 142799072657408 bytes, 34863054848 > sectors > Units = sectors of 1 * 4096 = 4096 bytes > Sector size (logical/physical): 4096 bytes / 4096 bytes > I/O size (minimum/optimal): 2097152 bytes / 2097152 bytes > > > Thanks, > David > > On Tue, Jul 6, 2021 at 10:35 PM Spitz, Cory James <cory.sp...@hpe.com> > wrote: > >> What OST index (number) were you trying to add? >> >> >> >> Andreas is right: >> >> Note that your "--index=0051" value is probably interpreted as an octal >> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match >> the OST device name) or "--index=81" (decimal). >> >> >> >> And you said: >> >> I'm aware that index 51 actually translates to hex 33 >> (local-OST0033_UUID). >> >> >> >> Ok, 0051 (in octal by way of the leading zeros*) translates to decimal 41 >> as Andreas pointed out, but that’s 0x29 in hexadecimal, not 0x33. Assuming >> you wanted to use decimal 51 then you’d have tried to mkfs.lustre the wrong >> index. So, if you wanted to use decimal 51, you’d have use –index=0x33 or >> –index=0063. >> >> >> >> -Cory >> >> >> >> p.s. >> >> (*) BTW, the convention with leading zeros for octal can be googled or >> read about at https://en.wikipedia.org/wiki/Octal. >> >> >> >> >> >> On 7/6/21, 12:35 AM, "lustre-discuss on behalf of David Cohen" < >> lustre-discuss-boun...@lists.lustre.org on behalf of >> cda...@physics.technion.ac.il> wrote: >> >> >> >> Thanks Andreas, >> >> I'm aware that index 51 actually translates to hex 33 >> (local-OST0033_UUID). >> I don't believe that's the reason for the failed mount as it is only an >> index that I increase for every new OST and there are no duplicates. >> >> >> >> lctl dk show tens of thousands of lines repeating the same error after >> attempting to mount the OST: >> >> >> >> 00100000:10000000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> 00100000:10000000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> 00100000:10000000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> >> >> in /var/log/messages I see the following corresponding to dm21 which is >> the new OST: >> >> >> >> Jul 6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, >> please wait. >> >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled, >> maximum tree depth=5 >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous >> mount: IO failure >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check. >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs >> with errors, running e2fsck is recommended >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with >> ordered data mode. Opts: >> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21): >> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad >> entry in directory: rec_len is too small for name_len - offset=4084(4084), >> inode=0, rec_len=12 >> , name_len=0 >> Jul 6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8. >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem >> read-only >> Jul 6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21): >> kmmpd:187: kmmpd being stopped since filesystem has been remounted as >> readonly. >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last >> fsck: 6 >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time >> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233 >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time >> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233 >> >> As I mentioned before mount never completes so the only way out of that >> is force reboot. >> >> Thanks, >> David >> >> >> >> On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger <adil...@whamcloud.com> >> wrote: >> >> >> >> >> >> On Jul 5, 2021, at 09:05, David Cohen <cda...@physics.technion.ac.il> >> wrote: >> >> >> >> Hi, >> >> I'm using Lustre 2.10.5 and lately tried to add a new OST. >> >> The OST was formatted with the command below, which other than the index >> is the exact same one used for all the other OSTs in the system. >> >> >> >> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost >> --fsname=local --index=0051 --param ost.quota_type=ug >> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp >> --mgsnode=10.0.0.1@tc >> >> p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp >> --servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051 >> >> >> >> Note that your "--index=0051" value is probably interpreted as an octal >> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match >> the OST device name) or "--index=81" (decimal). >> >> >> >> >> >> When trying to mount the with: >> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051 >> >> >> >> The system stays on 100% CPU (one core) forever and the mount never >> completes, not even after a week. >> >> >> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the >> other targets, but the behaviour remains the same. >> >> >> >> Cheers, Andreas >> >> -- >> >> Andreas Dilger >> >> Lustre Principal Architect >> >> Whamcloud >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org