Graham Leggett wrote: > On Wed, October 25, 2006 11:48 pm, Davi Arnaut wrote: > >> No, we start with a file bucket on the brigade, I will try to explain >> what happens to see if we are talking about the same thing. >> >> Suppose we have a brigade containing a a file bucket, and the file size >> is 4.7GB. We want to read it fully. >> >> When we call apr_bucket_read() on this bucket, we end-up calling the >> bucket read function (file_bucket_read). What does the bucket file read do >> ? >> >> If mmap is supported, it mmaps APR_MMAP_LIMIT bytes (4MB) by creating a >> new mmap bucket and splits the bucket. So, after calling read on a file >> bucket you have two buckets on the brigade. The first one is the mmap >> bucket and last is file bucket with a updated offset. >> >> The same thing happens if mmap is not supported, but the bucket type >> will be a heap bucket. If we don't delete or flush those implicitly >> created buckets they will keep the whole file in memory, but one single >> read will not put the entire file on memory. >> >> What Joe's patch does is remove this first implicitly created bucket >> from the brigade, placing it on the brigade on a temporary brigade for >> sending it to the client. > > Ok, this makes more sense, but in its current form the temporary brigade > should not be necessary.
But it would be better. We don't need to keep creating brigades with apr_brigade_split(), we could only move the buckets to a temporary brigade. Take a look at apr_brigade_split: APU_DECLARE(apr_bucket_brigade *) apr_brigade_split(apr_bucket_brigade *b, apr_bucket *e) { apr_bucket_brigade *a; apr_bucket *f; a = apr_brigade_create(b->p, b->bucket_alloc); /* Return an empty brigade if there is nothing left in * the first brigade to split off */ if (e != APR_BRIGADE_SENTINEL(b)) { f = APR_RING_LAST(&b->list); APR_RING_UNSPLICE(e, f, link); APR_RING_SPLICE_HEAD(&a->list, e, f, apr_bucket, link); } APR_BRIGADE_CHECK_CONSISTENCY(a); APR_BRIGADE_CHECK_CONSISTENCY(b); return a; } If we used a tmpbb we could replace the apr_brigade_split with something similar to: if (e != APR_BRIGADE_SENTINEL(b)) { f = APR_RING_LAST(&b->list); APR_RING_UNSPLICE(e, f, link); APR_RING_SPLICE_HEAD(&a->list, e, f, apr_bucket, link); } And instead of destroying the brigade, we could only do a call to: apr_brigade_cleanup(tmpbb); > While disk cache goes through the brigade, it replaces buckets on the > incoming brigade with a bucket referring to the cached file. > > I think a number of the thorny memory problems in mod_*cache have come > about because the brigade is read twice - once by store_body, and once by > ap_core_output_filter. Maybe. Buckets are reference counted and get reused pretty quickly. > The replacing of the buckets with file buckets in the brigade by disk > cache means the brigade will only be read once - by save_body(). Right, I liked your approach. I just wanted to be sure everyone is on the same page on this issue. >> That's why splitting the brigade with magical values (16MB) is not such >> a good idea, because the bucket type knows betters and will split the >> bucket anyway. > > Right, the picture is getting clearer. > > Look closer at the cache code in mod_cache, right at the end of the > CACHE_SAVE filter. > > Notice how store_body() is called, followed by ap_pass_brigade(). Notice > how store_body() is expected to handle the complete brigade, all 4.7GB of > it, before ap_pass_brigade() has a look see. Now - create a 4.7GB file on > your drive. Copy this file to a second filename. Time this and tell me how > long it takes. Do you see the further problem? Yup. > The split-before-save_body patch wasn't described well enough by me - it > is designed to prevent save_body() from having to handle bucket sizes that > are impractically large to handle by a cache, both in terms of memory, and > in terms of time. > That is clear. -- Davi Arnaut