There are several things that could have been done.  The most likely are:

1) you deactivated the OSTs on the MSD, using something like:

# lctl set_param ost.work-OST0001.active=0
# lctl set_param ost.work-OST0002.active=0

2) you set the file stripe on the directory to use only OST0, as with

# lfs setstripe -i 0 .

I would think that you'd remember #1, so my guess would be #2, which 
could have happened when someone intended to do "lfs setstripe -c 0".  
Do an "lfs getstripe ."  A simple:

"lfs setstripe -i -1 ." in each directory

should clear it up going forward.  Note that existing files will NOT be 
re-striped, but new files will be balanced going forward.

Kevin


Aaron Everett wrote:
> Thanks for the reply.
>
> File sizes are all <1GB and most files are <1MB. For a test, I copied a 
> typical result set from a non-lustre mount to my lustre directory. Total size 
> of the test is 42GB. I included before/after results for lfs df -i from a 
> client. 
>
> Before test:
> [r...@englogin01 backups]# lfs df 
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6% /lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 701771484 562701392   55% /lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 396097912 868374964   31% /lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393607384 870865492   31% /lustre/work[OST:2]
>
> filesystem summary:  3793418628 1491476780 2301941848   39% /lustre/work
>
> [r...@englogin01 backups]# lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497433511  33195991 464237520    6% /lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13585653  66704139   16% /lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7014185  73275607    8% /lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013859  73275933    8% /lustre/work[OST:2]
>
> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>
>
> After test:
>
> [aever...@englogin01 ~]$ lfs df
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6% /lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 759191664 505281212   60% /lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 395929536 868543340   31% /lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393392924 871079952   31% /lustre/work[OST:2]
>
> filesystem summary:  3793418628 1548514124 2244904504   40% /lustre/work
>
> [aever...@englogin01 ~]$ lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497511996  33298931 464213065    6% /lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13665028  66624764   17% /lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7013783  73276009    8% /lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013456  73276336    8% /lustre/work[OST:2]
>
> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>
>
>
>
>
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Brian J. Murrell
> Sent: Thursday, March 19, 2009 3:13 PM
> To: [email protected]
> Subject: Re: [Lustre-discuss] Unbalanced load across OST's
>
> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>   
>> Hello all,
>>     
>
> Hi,
>
>   
>> We are running 1.6.6 with a shared mgs/mdt and 3 ost’s. We run a set 
>> of tests that write heavily, then we review the results and delete the 
>> data. Usually the load is evenly spread across all 3 ost’s. I noticed 
>> this afternoon that the load does not seem to be distributed.
>>     
>
> Striping as well as file count and size affects OST distribution as well.  
> Are any of the data involved striped?  Are you writing very few large files 
> before you measure distribution?
>
>   
>> OST0000 has a load of 50+ with iowait of around 10%
>>
>> OST0001 has a load of <1 with >99% idle
>>
>> OST0002 has a load of <1 with >99% idle
>>     
>
> What does lfs df say before and after such a test that produces the above 
> results?  Does it bear out even use amongst the OST before, and after the 
> test?
>
>   
>> df confirms the lopsided writes:
>>     
>
> lfs df [-i] from a client is usually more illustrative of use.  As I say 
> above, if you can quiesce the filesystem for the test above, do an lfs df; 
> lfs df -i before the test and after.  Assuming you were successful in 
> quiescing, you should see the change to the OSTs that your test effected.
>
>   
>> OST0000:
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>>
>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>     
>
> What's important is what it looked like before the test too.  Your test could 
> have, for example, wrote a single object (i.e. file) of nearly 300G for all 
> we can tell from what you've posted so far.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to