You can deactivate it on the MDT, that will make it RO, but leave it
alone on the clients so they can still access files from it.
bob
On 2/15/2011 1:57 PM, Jagga Soorma wrote:
Hi Guys,
One of my clients got a hung lustre mount this morning and I saw the
following errors in my logs:
--
..snip..
Feb 15 09:38:07 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_write operation
failed with -28
Feb 15 09:38:07 reshpc116 kernel: LustreError: Skipped 4755836
previous similar messages
Feb 15 09:48:07 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_write operation
failed with -28
Feb 15 09:48:07 reshpc116 kernel: LustreError: Skipped 4649141
previous similar messages
Feb 15 10:16:54 reshpc116 kernel: Lustre:
6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1360125198261945 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to
NID 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline).
Feb 15 10:16:54 reshpc116 kernel: Lustre:
reshpcfs-OST0005-osc-ffff8830175c8400: Connection to service
reshpcfs-OST0005 via nid 10.0.250.47@o2ib3 was lost; in progress
operations using this service will wait for recovery to complete.
Feb 15 10:16:54 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_connect operation
failed with -16
Feb 15 10:16:54 reshpc116 kernel: LustreError: Skipped 2888779
previous similar messages
Feb 15 10:16:55 reshpc116 kernel: Lustre:
6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1360125198261947 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to
NID 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline).
Feb 15 10:18:11 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_connect operation
failed with -16
Feb 15 10:18:11 reshpc116 kernel: LustreError: Skipped 10 previous
similar messages
Feb 15 10:20:45 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_connect operation
failed with -16
Feb 15 10:20:45 reshpc116 kernel: LustreError: Skipped 21 previous
similar messages
Feb 15 10:25:46 reshpc116 kernel: LustreError: 11-0: an error occurred
while communicating with 10.0.250.47@o2ib3. The ost_connect operation
failed with -16
Feb 15 10:25:46 reshpc116 kernel: LustreError: Skipped 42 previous
similar messages
Feb 15 10:31:43 reshpc116 kernel: Lustre:
reshpcfs-OST0005-osc-ffff8830175c8400: Connection restored to service
reshpcfs-OST0005 using nid 10.0.250.47@o2ib3.
--
Due to disk space issues on my lustre filesystem one of the OST's were
full and I deactivated that OST this morning. I thought that
operation just puts it in a read only state and that clients can still
access the data from that OST. After activating this OST again the
client connected again and was okay after this. How else would you
deal with a OST that is close to 100% full? Is it okay to leave the
OST active and the clients will know not to write data to that OST?
Thanks,
-J
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss