follow up - rebooting the client fixed this issue - I could not remove the kernel modules (lustre_rmmod) and restart lnet even though the filesystem was unmounted, presumably because there was still some transaction trying to be played out. is there a better way to do this?
sam a. On Mar 20, 2011, at 8:41 PM, Samuel Aparicio wrote: > I am stuck with the following issue on a client attached to a lustre system. > we are running lustre 1.8.5 > somehow connectivity to the OST failed at some point and the mount hung. > after unmounting and re-mounting the client attempts to reconnect. > lctl ping shows the client to be connected and normal ping to the OSS/MGS > servers shows connectivity. > > remounting the filesystem results in only some files being visible. > the kernel messages are as follows: > --------- > Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request > Lustre: lustre-OST0003-osc-ffff8110238c7400.osc: set parameter active=0 > Lustre: Skipped 3 previous similar messages > LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC ^\; > administratively disabled > Lustre: Client lustre-client has started > LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc > -5, returning -EIO > LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous > similar message > LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc > -5, returning -EIO > LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc > -5, returning -EIO > Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request > x1363662012007464 sent from lustre-OST0000-osc-ffff8110238c7400 to NID > 10.9.89.21@tcp 16s ago has timed out (16s prior to deadline). > req@ffff810459ce4c00 x1363662012007464/t0 > o8->[email protected]@tcp:28/4 lens 368/584 e 0 to 1 dl > 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0 > Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182 > previous similar messages > Lustre: 22219:0:(import.c:517:import_select_connection()) > lustre-OST0000-osc-ffff8110238c7400: tried all connections, increasing > latency to 18s > Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203 > previous similar messages > ------------ > > an LS of the filesytem shows > > drwxr-xr-x 4 amcpherson users 4096 Mar 19 10:38 amcpherson > ?--------- ? ? ? ? ? compute-2-0-testwrite > ?--------- ? ? ? ? ? hello > > ---------- > > other clients on the system are able to mount and see the files perfectly > well. > > can anyone help with what the errors above imply. > > a simple network connectivity issue does not seem to be the case here, > yet the client attempts to re-connect to the OST, fail. > > > > > > > > Professor Samuel Aparicio BM BCh PhD FRCPath > Nan and Lorraine Robertson Chair UBC/BC Cancer Agency > 675 West 10th, Vancouver V5Z 1L3, Canada. > office: +1 604 675 8200 lab website http://molonc.bccrc.ca > > PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND > THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW > Ride to Seattle Fundraiser > Weekend to End Womens Cancers > > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
