Hi all, a recent posting here (which I can't find atm) has pointed me to http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that we seem to see as well: some OSS really get overloaded, and the log says
slow journal start 36s due to heavy IO load slow commitrw commit 36s due to heavy IO load slow start_page_read 169s due to heavy IO load slow direct_io 34s due to heavy IO load ... The bugzilla discussion seems to propose a number of steps to go on each OSS as a workaround, among them setting readcache_max_filesize=32M or readcache_max_filesize=0 I have checked the current value of this parameter and found readcache_max_filesize=18446744073709551615 which translates to 16 EB (if I counted the powers of 1024 correctly). Am I correct assuming that this is the default value, and that this default is meant to read "unlimited"? Or is our OSS configuration just badly messed up? Also, people recommend pinning the bitmaps to memory - how do you do that? Preallocation tables all seem to contain "256 512 1024", so no shrinking of prealloc_table is necessary. The OSTs in question have just reached the 85% level. We have a number of older OSS which are closer to 95% - I guess the problem doesn't show up there, because there is no room for further files anyhow... Regards, Thomas _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
