Hi, On our cluster, when there is a load on Lustre FS, at some points it slows down precipitously, and there are very very many "slow IO " and "slow setattr" messages on the OSS servers:
======= [2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due to heavy IO load [2988758.408974] Lustre: Skipped 276 previous similar messages [2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO load [2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy IO load [2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to heavy IO load [2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to heavy IO load [2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to heavy IO load [2988823.125419] Lustre: Skipped 1 previous similar message [2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due to heavy IO load [2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy IO load [2988856.236922] Lustre: Skipped 323 previous similar messages [2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy IO load [2988892.543950] Lustre: Skipped 280 previous similar messages [2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy IO load [2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to heavy IO load [2988892.547334] Lustre: Skipped 4 previous similar messages [2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy IO load [2988958.306724] Lustre: Skipped 1 previous similar message [2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to heavy IO load [2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO load ========= I wonder if mounting it on clients with "noatime" and/or changing the atime_diff would help to rid off of these Lustre slowdowns? Right now we have: /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server is 60. I've tried to Google it first, and found that apparently "noatime " is not supported for 1.8, and changing atime_diff is the preferred way? Could you please advise me, which way is better/possible, and how does one change atime_diff? Will it help? Does it require, say, client's remount, etc.? Any ideas and advice would be greatly appreciated! Thank you very much in advance. -- Grigory Shamov HPC Analyst, Westgrid/Compute Canada E2-588 EITC Building, University of Manitoba (204) 474-9625 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
