Well, you are absolutely right, it is a timeout talking to what it THINKS is the MDT. The thing is, it is NOT!
We were set up for HA for the MDT, with 10.10.1.48 and 10.10.1.49 watching and talking to one another. The RedHat service was problematic, so right now 10.10.1.48 is the MDT, and has /mnt/mdt mounted, and 10.10.1.49 is being used to do backups, and has /mnt/mdt_snapshot mounted. The actual volume is an iSCSI location. So, somehow, the client node has found and is talking to the wrong host! Not good. Scary. Got to do something about this..... Suggestions appreciated.... bob On 12/14/2010 11:57 AM, Andreas Dilger wrote: > The error message shows a timeout connecting to umt3-MDT0000 and not the OST. > The operation 38 is MDS_CONNECT, AFAIK. > > Cheers, Andreas > > On 2010-12-14, at 9:19, Bob Ball<[email protected]> wrote: > >> I am trying to get a lustre client to mount the service, but with one or >> more OST disabled. This does not appear to be working. Lustre version >> is 1.8.4. >> >> mount -o localflock,exclude=umt3-OST0019 -t lustre >> 10.10.1....@tcp0:/umt3 /lustre/umt3 >> >> dmesg on this client shows the following during the umount/mount sequence: >> >> Lustre: client ffff810c25c03800 umount complete >> Lustre: Skipped 1 previous similar message >> Lustre: mgc10.10.1....@tcp: Reactivating import >> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Excluding >> umt3-OST0019 (on exclusion list) >> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Skipped 1 >> previous similar message >> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request >> x1354682302740498 sent from umt3-MDT0000-mdc-ffff810628209000 to NID >> 10.10.1...@tcp 0s ago has failed due to network error (5s prior to >> deadline). >> r...@ffff810620e66400 x1354682302740498/t0 >> o38->[email protected]@tcp:12/10 lens 368/584 e 0 to 1 dl >> 1292342239 ref 1 fl Rpc:N/0/0 rc 0/0 >> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 >> previous similar message >> Lustre: Client umt3-client has started >> >> When I check following the mount, using "lctl dl", I see the following, >> and it is clear that the OST is active as far as this client is concerned. >> >> 19 UP osc umt3-OST0018-osc-ffff810628209000 >> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >> 20 UP osc umt3-OST0019-osc-ffff810628209000 >> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >> 21 UP osc umt3-OST001a-osc-ffff810628209000 >> 05b29472-d125-c36e-c023-e0eb76aaf353 5 >> >> Two questions here. The first, obviously, is what is wrong with this >> picture? Why can't I exclude this OST from activity on this client? Is >> it because the OSS serving that OST still has the OST active? If the >> OST were deactivated or otherwise unavailable on the OSS, would the >> client mount then succeed to exclude this OST? (OK, more than one >> question in the group....) >> >> Second group, what is the correct syntax for excluding more that one >> OST? Is it a comma-separated list of exclusions, or are separate >> excludes required? >> >> mount -o localflock,exclude=umt3-OST0019,umt3-OST0020 -t lustre >> 10.10.1....@tcp0:/umt3/lustre/umt3 >> or >> mount -o localflock,exclude=umt3-OST0019,exclude=umt3-OST0020 -t >> lustre 10.10.1....@tcp0:/umt3 /lustre/umt3 >> >> Thanks, >> bob >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
