I might be looking at the wrong OST. What is the best way to map the actual /dev/mapper/mpath[X] to what OST ID is used for that volume?
Thanks, -J On Tue, Feb 15, 2011 at 3:01 PM, Jagga Soorma <[email protected]> wrote: > Also, it looks like the client is reporting a different %used compared to > the oss server itself: > > client: > reshpc101:~ # lfs df -h | grep -i 0007 > reshpcfs-OST0007_UUID 2.0T 1.7T 202.7G 84% /reshpcfs[OST:7] > > oss: > /dev/mapper/mpath7 2.0T 1.9T 40G 98% /gnet/lustre/oss02/mpath7 > > Here is how the data seems to be distributed on one of the OSS's: > -- > /dev/mapper/mpath5 2.0T 1.2T 688G 65% /gnet/lustre/oss02/mpath5 > /dev/mapper/mpath6 2.0T 1.7T 224G 89% /gnet/lustre/oss02/mpath6 > /dev/mapper/mpath7 2.0T 1.9T 41G 98% /gnet/lustre/oss02/mpath7 > /dev/mapper/mpath8 2.0T 1.3T 671G 65% /gnet/lustre/oss02/mpath8 > /dev/mapper/mpath9 2.0T 1.3T 634G 67% /gnet/lustre/oss02/mpath9 > -- > > -J > > > On Tue, Feb 15, 2011 at 2:37 PM, Jagga Soorma <[email protected]> wrote: > >> I did deactivate this OST on the MDS server. So how would I deal with a >> OST filling up? The OST's don't seem to be filling up evenly either. How >> does lustre handle a OST that is at 100%? Would it not use this specific >> OST for writes if there are other OST available with capacity? >> >> Thanks, >> -J >> >> >> On Tue, Feb 15, 2011 at 11:45 AM, Andreas Dilger >> <[email protected]>wrote: >> >>> On 2011-02-15, at 12:20, Cliff White wrote: >>> > Client situation depends on where you deactivated the OST - if you >>> deactivate on the MDS only, clients should be able to read. >>> > >>> > What is best to do when an OST fills up really depends on what else you >>> are doing at the time, and how much control you have over what the clients >>> are doing and other things. If you can solve the space issue with a quick >>> rm -rf, best to leave it online, likewise if all your clients are trying to >>> bang on it and failing, best to turn things off. YMMV >>> >>> In theory, with 1.8 the full OST should be skipped for new object >>> allocations, but this is not robust in the face of e.g. a single very large >>> file being written to the OST that takes it from "average" usage to being >>> full. >>> >>> > On Tue, Feb 15, 2011 at 10:57 AM, Jagga Soorma <[email protected]> >>> wrote: >>> > Hi Guys, >>> > >>> > One of my clients got a hung lustre mount this morning and I saw the >>> following errors in my logs: >>> > >>> > -- >>> > ..snip.. >>> > Feb 15 09:38:07 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_write operation >>> failed with -28 >>> > Feb 15 09:38:07 reshpc116 kernel: LustreError: Skipped 4755836 previous >>> similar messages >>> > Feb 15 09:48:07 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_write operation >>> failed with -28 >>> > Feb 15 09:48:07 reshpc116 kernel: LustreError: Skipped 4649141 previous >>> similar messages >>> > Feb 15 10:16:54 reshpc116 kernel: Lustre: >>> 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request >>> x1360125198261945 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID >>> 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). >>> > Feb 15 10:16:54 reshpc116 kernel: Lustre: >>> reshpcfs-OST0005-osc-ffff8830175c8400: Connection to service >>> reshpcfs-OST0005 via nid 10.0.250.47@o2ib3 was lost; in progress >>> operations using this service will wait for recovery to complete. >>> > Feb 15 10:16:54 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_connect operation >>> failed with -16 >>> > Feb 15 10:16:54 reshpc116 kernel: LustreError: Skipped 2888779 previous >>> similar messages >>> > Feb 15 10:16:55 reshpc116 kernel: Lustre: >>> 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request >>> x1360125198261947 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID >>> 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). >>> > Feb 15 10:18:11 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_connect operation >>> failed with -16 >>> > Feb 15 10:18:11 reshpc116 kernel: LustreError: Skipped 10 previous >>> similar messages >>> > Feb 15 10:20:45 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_connect operation >>> failed with -16 >>> > Feb 15 10:20:45 reshpc116 kernel: LustreError: Skipped 21 previous >>> similar messages >>> > Feb 15 10:25:46 reshpc116 kernel: LustreError: 11-0: an error occurred >>> while communicating with 10.0.250.47@o2ib3. The ost_connect operation >>> failed with -16 >>> > Feb 15 10:25:46 reshpc116 kernel: LustreError: Skipped 42 previous >>> similar messages >>> > Feb 15 10:31:43 reshpc116 kernel: Lustre: >>> reshpcfs-OST0005-osc-ffff8830175c8400: Connection restored to service >>> reshpcfs-OST0005 using nid 10.0.250.47@o2ib3. >>> > -- >>> > >>> > Due to disk space issues on my lustre filesystem one of the OST's were >>> full and I deactivated that OST this morning. I thought that operation just >>> puts it in a read only state and that clients can still access the data from >>> that OST. After activating this OST again the client connected again and >>> was okay after this. How else would you deal with a OST that is close to >>> 100% full? Is it okay to leave the OST active and the clients will know not >>> to write data to that OST? >>> > >>> > Thanks, >>> > -J >>> > >>> > _______________________________________________ >>> > Lustre-discuss mailing list >>> > [email protected] >>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> > >>> > >>> > _______________________________________________ >>> > Lustre-discuss mailing list >>> > [email protected] >>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Principal Engineer >>> Whamcloud, Inc. >>> >>> >>> >>> >> >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
