Hi,

The messages indicate overloaded backend storage. You could try this, 
another option may be to statically set the maximum number of threads on 
the OSS, this should reduce load to the system and push the backlogs to 
your clients (hopefully)

-cf


On 12/06/2012 12:06 PM, Grigory Shamov wrote:
> Hi,
>
> On our cluster, when there is a load on Lustre FS, at some points it slows 
> down precipitously, and there are very very many "slow IO " and "slow 
> setattr" messages on the OSS servers:
>
> =======
> [2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due to heavy IO 
> load
> [2988758.408974] Lustre: Skipped 276 previous similar messages
> [2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO 
> load
> [2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy IO 
> load
> [2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to heavy 
> IO load
> [2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to heavy 
> IO load
> [2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to heavy 
> IO load
> [2988823.125419] Lustre: Skipped 1 previous similar message
> [2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due to 
> heavy IO load
> [2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy IO 
> load
> [2988856.236922] Lustre: Skipped 323 previous similar messages
> [2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy IO 
> load
> [2988892.543950] Lustre: Skipped 280 previous similar messages
> [2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy IO 
> load
> [2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to heavy 
> IO load
> [2988892.547334] Lustre: Skipped 4 previous similar messages
> [2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy IO 
> load
> [2988958.306724] Lustre: Skipped 1 previous similar message
> [2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to heavy 
> IO load
> [2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO 
> load
> =========
>
> I wonder if mounting it on clients with "noatime" and/or changing the 
> atime_diff would help to rid off of these Lustre slowdowns? Right now we 
> have:  /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server is 60.
>
> I've tried to Google it first, and found that apparently "noatime " is not 
> supported for 1.8, and changing atime_diff is the preferred way?
>
> Could you please advise me, which way is better/possible, and how does one 
> change atime_diff?  Will it help? Does it require, say, client's remount, 
> etc.?
>
> Any ideas and advice would be greatly appreciated! Thank you very much in 
> advance.
>
>
> --
> Grigory Shamov
> HPC Analyst, Westgrid/Compute Canada
> E2-588 EITC Building, University of Manitoba
> (204) 474-9625
>
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to