OK, so, we rebooted 10.10.1.49 into a different, non-lustre kernel. Then, to be as certain as I could be that the client did not know about 10.10.1.49, I rebooted it as well. After it was fully up (with the lustre file system mount in /etc/fstab) I umounted it, then mounted again as below. And, the message still came back that it was trying to contact 10.10.1.49 instead of 10.10.1.48 as it should. To repeat, the dmesg is logging:
Lustre: mgc10.10.1....@tcp: Reactivating import Lustre: 10523:0:(obd_mount.c:1786:lustre_check_exclusion()) Excluding umt3-OST0019 (on exclusion list) Lustre: 5936:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1355139761832543 sent from umt3-MDT0000-mdc-ffff81062c82c400 to NID 10.10.1...@tcp 0s ago has failed due to network error (5s prior to deadline). r...@ffff81060e4ebc00 x1355139761832543/t0 o38->[email protected]@tcp:12/10 lens 368/584 e 0 to 1 dl 1292362202 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: Client umt3-client has started I guess I need to know why, in the world, is this client still trying to access 10.10.1.49? Is there something, perhaps, on the MGC machine that is causing this mis-direct? What? And, most importantly, how do I fix this? bob On 12/14/2010 3:05 PM, Bob Ball wrote: > Well, you are absolutely right, it is a timeout talking to what it > THINKS is the MDT. The thing is, it is NOT! > > We were set up for HA for the MDT, with 10.10.1.48 and 10.10.1.49 > watching and talking to one another. The RedHat service was > problematic, so right now 10.10.1.48 is the MDT, and has /mnt/mdt > mounted, and 10.10.1.49 is being used to do backups, and has > /mnt/mdt_snapshot mounted. The actual volume is an iSCSI location. > > So, somehow, the client node has found and is talking to the wrong > host! Not good. Scary. Got to do something about this..... > > Suggestions appreciated.... > > bob > > On 12/14/2010 11:57 AM, Andreas Dilger wrote: >> The error message shows a timeout connecting to umt3-MDT0000 and not the >> OST. The operation 38 is MDS_CONNECT, AFAIK. >> >> Cheers, Andreas >> >> On 2010-12-14, at 9:19, Bob Ball<[email protected]> wrote: >> >>> I am trying to get a lustre client to mount the service, but with one or >>> more OST disabled. This does not appear to be working. Lustre version >>> is 1.8.4. >>> >>> mount -o localflock,exclude=umt3-OST0019 -t lustre >>> 10.10.1....@tcp0:/umt3 /lustre/umt3 >>> >>> dmesg on this client shows the following during the umount/mount sequence: >>> >>> Lustre: client ffff810c25c03800 umount complete >>> Lustre: Skipped 1 previous similar message >>> Lustre: mgc10.10.1....@tcp: Reactivating import >>> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Excluding >>> umt3-OST0019 (on exclusion list) >>> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Skipped 1 >>> previous similar message >>> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request >>> x1354682302740498 sent from umt3-MDT0000-mdc-ffff810628209000 to NID >>> 10.10.1...@tcp 0s ago has failed due to network error (5s prior to >>> deadline). >>> r...@ffff810620e66400 x1354682302740498/t0 >>> o38->[email protected]@tcp:12/10 lens 368/584 e 0 to 1 dl >>> 1292342239 ref 1 fl Rpc:N/0/0 rc 0/0 >>> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 >>> previous similar message >>> Lustre: Client umt3-client has started >>> >>> When I check following the mount, using "lctl dl", I see the following, >>> and it is clear that the OST is active as far as this client is concerned. >>> >>> 19 UP osc umt3-OST0018-osc-ffff810628209000 >>> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >>> 20 UP osc umt3-OST0019-osc-ffff810628209000 >>> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >>> 21 UP osc umt3-OST001a-osc-ffff810628209000 >>> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >>> >>> Two questions here. The first, obviously, is what is wrong with this >>> picture? Why can't I exclude this OST from activity on this client? Is >>> it because the OSS serving that OST still has the OST active? If the >>> OST were deactivated or otherwise unavailable on the OSS, would the >>> client mount then succeed to exclude this OST? (OK, more than one >>> question in the group....) >>> >>> Second group, what is the correct syntax for excluding more that one >>> OST? Is it a comma-separated list of exclusions, or are separate >>> excludes required? >>> >>> mount -o localflock,exclude=umt3-OST0019,umt3-OST0020 -t lustre >>> 10.10.1....@tcp0:/umt3/lustre/umt3 >>> or >>> mount -o localflock,exclude=umt3-OST0019,exclude=umt3-OST0020 -t >>> lustre 10.10.1....@tcp0:/umt3 /lustre/umt3 >>> >>> Thanks, >>> bob >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
