I did deactivate this OST on the MDS server. So how would I deal with a OST filling up? The OST's don't seem to be filling up evenly either. How does lustre handle a OST that is at 100%? Would it not use this specific OST for writes if there are other OST available with capacity?
Thanks, -J On Tue, Feb 15, 2011 at 11:45 AM, Andreas Dilger <[email protected]>wrote: > On 2011-02-15, at 12:20, Cliff White wrote: > > Client situation depends on where you deactivated the OST - if you > deactivate on the MDS only, clients should be able to read. > > > > What is best to do when an OST fills up really depends on what else you > are doing at the time, and how much control you have over what the clients > are doing and other things. If you can solve the space issue with a quick > rm -rf, best to leave it online, likewise if all your clients are trying to > bang on it and failing, best to turn things off. YMMV > > In theory, with 1.8 the full OST should be skipped for new object > allocations, but this is not robust in the face of e.g. a single very large > file being written to the OST that takes it from "average" usage to being > full. > > > On Tue, Feb 15, 2011 at 10:57 AM, Jagga Soorma <[email protected]> > wrote: > > Hi Guys, > > > > One of my clients got a hung lustre mount this morning and I saw the > following errors in my logs: > > > > -- > > ..snip.. > > Feb 15 09:38:07 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_write operation failed > with -28 > > Feb 15 09:38:07 reshpc116 kernel: LustreError: Skipped 4755836 previous > similar messages > > Feb 15 09:48:07 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_write operation failed > with -28 > > Feb 15 09:48:07 reshpc116 kernel: LustreError: Skipped 4649141 previous > similar messages > > Feb 15 10:16:54 reshpc116 kernel: Lustre: > 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request > x1360125198261945 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID > 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). > > Feb 15 10:16:54 reshpc116 kernel: Lustre: > reshpcfs-OST0005-osc-ffff8830175c8400: Connection to service > reshpcfs-OST0005 via nid 10.0.250.47@o2ib3 was lost; in progress > operations using this service will wait for recovery to complete. > > Feb 15 10:16:54 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_connect operation > failed with -16 > > Feb 15 10:16:54 reshpc116 kernel: LustreError: Skipped 2888779 previous > similar messages > > Feb 15 10:16:55 reshpc116 kernel: Lustre: > 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request > x1360125198261947 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID > 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). > > Feb 15 10:18:11 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_connect operation > failed with -16 > > Feb 15 10:18:11 reshpc116 kernel: LustreError: Skipped 10 previous > similar messages > > Feb 15 10:20:45 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_connect operation > failed with -16 > > Feb 15 10:20:45 reshpc116 kernel: LustreError: Skipped 21 previous > similar messages > > Feb 15 10:25:46 reshpc116 kernel: LustreError: 11-0: an error occurred > while communicating with 10.0.250.47@o2ib3. The ost_connect operation > failed with -16 > > Feb 15 10:25:46 reshpc116 kernel: LustreError: Skipped 42 previous > similar messages > > Feb 15 10:31:43 reshpc116 kernel: Lustre: > reshpcfs-OST0005-osc-ffff8830175c8400: Connection restored to service > reshpcfs-OST0005 using nid 10.0.250.47@o2ib3. > > -- > > > > Due to disk space issues on my lustre filesystem one of the OST's were > full and I deactivated that OST this morning. I thought that operation just > puts it in a read only state and that clients can still access the data from > that OST. After activating this OST again the client connected again and > was okay after this. How else would you deal with a OST that is close to > 100% full? Is it okay to leave the OST active and the clients will know not > to write data to that OST? > > > > Thanks, > > -J > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
