mm/readahead.c has the logic for rampup. It detects sequentiality.. http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3318.html
On Sat, Sep 26, 2009 at 12:48 AM, Peter Teoh <[email protected]>wrote: > On Fri, Sep 25, 2009 at 11:29 PM, shailesh jain > <[email protected]> wrote: > > Yes I understand that. Cases for random-reads and other non-sequential > > workloads, readahead logic will > > not ramp up to max size anyway. What I want is to bump up max size, so > that > > when kernel detects sequential worklaod > > it puzzled me how to distinguished between sequential and random > read......does the kernel actually detect and check that a series of > read are contiguous? not sensible either. read-ahead means reading > ahead of expectation, so by the time it detect and check that the > series of read are contiguous, it really does not classified into > "read-ahead" anymore. > > any way, i did a ftrace stacktrace for reading /var/log/messages: > > 1197 => ext3_get_blocks_handle > 1198 => ext3_get_block > 1199 => do_mpage_readpage > 1200 => mpage_readpages > 1201 => ext3_readpages > 1202 => __do_page_cache_readahead > 1203 => ra_submit > 1204 => filemap_fault > 1205 head-25243 [000] 20698.351148: blk_queue_bounce > <-__make_request > 1206 head-25243 [000] 20698.351148: <stack trace> > 1207 => __make_request > 1208 => generic_make_request > 1209 => submit_bio > 1210 => mpage_bio_submit > 1211 => do_mpage_readpage > 1212 => mpage_readpages > 1213 => ext3_readpages > 1214 => __do_page_cache_readahead > 1215 head-25243 [000] 20698.351159: blk_rq_init <-get_request > 1216 head-25243 [000] 20698.351159: <stack trace> > 1217 => get_request > 1218 => get_request_wait > 1219 => __make_request > 1220 => generic_make_request > 1221 => submit_bio > 1222 => mpage_bio_submit > 1223 => do_mpage_readpage > 1224 => mpage_readpages > > so from above, we can guess __do_page_cache_readahead() is the key > function involved: > > cut and paste (and read the comments below): > > > 253 /* > 254 * do_page_cache_readahead actually reads a chunk of disk. It > allocates all > 255 * the pages first, then submits them all for I/O. This avoids > the very bad > 256 * behaviour which would occur if page allocations are causing > VM writeback. > 257 * We really don't want to intermingle reads and writes like that. > 258 * > 259 * Returns the number of pages requested, or the maximum > amount of I/O allowed. > 260 * > 261 * do_page_cache_readahead() returns -1 if it encountered request > queue > 262 * congestion. > 263 */ > 264 static int > 265 __do_page_cache_readahead(struct address_space *mapping, > struct file *filp, > 266 pgoff_t offset, unsigned long nr_to_read) > 267 { > 268 struct inode *inode = mapping->host; > 269 struct page *page; > 270 unsigned long end_index; /* The last page we > want to read */ > 271 LIST_HEAD(page_pool); > 272 int page_idx; > 273 int ret = 0; > 274 loff_t isize = i_size_read(inode); > 275 > 276 if (isize == 0) > 277 goto out; > 278 > 279 end_index = ((isize - 1) >> PAGE_CACHE_SHIFT); > 280 > 281 /* > 282 * Preallocate as many pages as we will need. > 283 */ > 284 read_lock_irq(&mapping->tree_lock); > 285 for (page_idx = 0; page_idx < nr_to_read; page_idx++) { > 286 pgoff_t page_offset = offset + page_idx; > 287 > 288 if (page_offset > end_index) > 289 break; > 290 > 291 page = radix_tree_lookup(&mapping->page_tree, > page_offset); > 292 if (page) > 293 continue; > 294 > 295 read_unlock_irq(&mapping->tree_lock); > 296 page = page_cache_alloc_cold(mapping); > 297 read_lock_irq(&mapping->tree_lock); > 298 if (!page) > 299 break; > 300 page->index = page_offset; > 301 list_add(&page->lru, &page_pool); > 302 ret++; > 303 } > 304 read_unlock_irq(&mapping->tree_lock); > 305 > 306 /* > 307 * Now start the IO. We ignore I/O errors - if the page is > not > 308 * uptodate then the caller will launch readpage again, and > 309 * will then handle the error. > 310 */ > 311 if (ret) > 312 read_pages(mapping, filp, &page_pool, ret); > 313 BUG_ON(!list_empty(&page_pool)); > 314 out: > 315 return ret; > 316 } > 317 > 318 /* > > the HEART OF the algo is the last few lines--->read_pages(), and there > is no conditional logic in it, it just readahead blindly. > > > it does not restrict itself to 32 pages. > > > > I looked around and saw an old patch that tried to account for actual > memory > > on the system and setting max_readahead > > according to that. Restricting to arbitrary limits -- for instance think > > 512MB system vs 4GB system - is not sane IMO. > > > > interesting....can u share the link so perhaps i can learn something? > thanks pal!!! > > > > > > > Shailesh Jain > > > > > > On Fri, Sep 25, 2009 at 6:00 PM, Peter Teoh <[email protected]> > wrote: > >> > >> On Fri, Sep 25, 2009 at 12:05 AM, shailesh jain > >> <[email protected]> wrote: > >> > Hi, > >> > Is the maximum limit of readahead 128KB ? .. Can it be changed by > FS > >> > kernel module ? > >> > > >> > > >> > Shailesh Jain > >> > > >> > >> not sure why u want to change that? for a specific performance > >> tuning scenario (lots of sequential read)? this readahead feature is > >> useful only if u are intending on reading large files. But if u > >> switch to a different files, assuming many small files, u defeats the > >> purpose of readahead. i think this is an OS-independent features, > >> which is specifically tuned to the normal usage of the filesystem. > >> > >> so, for example for AIX: > >> > >> > >> > http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/seq_read_perf_tuning.htm > >> > >> their readahead is only (max) 16xpagesize. not sure how big is that, > >> but our 128KB should be > 16xpagesize (how big is our IO blocksize > >> anyway?) > >> > >> for another reputable references: > >> > >> http://www.dba-oracle.com/t_read_ahead_cache_windows.htm > >> > >> (in Oracle database). > >> > >> The problem is that if u read ahead too much, and after that the > >> entire buffer is going to be thrown away due to un-use, then a lot of > >> time is wasted in reading ahead. > >> > >> -- > >> Regards, > >> Peter Teoh > > > > > > > > -- > Regards, > Peter Teoh >
