On Mon, October 30, 2006 7:58 am, Justin Erenkrantz wrote:

>> We have a real world case where the cache is expected to process a many
>> MB or many GB file completely, before sending that same response to the
>> network. This is too slow, and takes up too much RAM, resulting in a
>> broken response to the client.
>
> In short, I haven't seen any evidence presented by you or others that
> this is due to a design flaw in the cache/provider abstraction.

+1.

Unfortunately, the patch to mod_cache that solved this problem and kept
the cache/provider abstraction intact was -1'd by Joe Orton.

What I should have done was challenge this point with Joe, but the signal
to noise ratio was too high at that point for there to have been any
chance of that happening.

Hopefully this can be remedied now.

> At
> best, mod_disk_cache could be smarter about storing the file when it's
> large - but that's just a small lines of code to fix - it doesn't
> require any massive changes or new bucket types.  Just copy the file
> bucket and consume that within store_body()'s implementation.

The apr_bucket_read() function, when given a large file, splits the bucket
on each read into a heap bucket and a file bucket.

This works throughout the server, because the most common use case for
this is "read bucket, do something with bucket, delete (newly created
heap) bucket, repeat".

In the cache however, the "delete bucket" step is missing - because the
same brigade needs to later be given to the output filter stack.

So we have a leak. An attempt to cache any file larger than RAM (an ISO
file, any typical video file) results in one file bucket being replaced
with a brigade of heap buckets the size of the file.

Out of memory, boom.

The second problem, even if we had enough memory, is that disk drives are
just too slow. The current implementation will try and write the entire
file from backend to cache disk, before a single byte is written to the
network.

On some drives, this can be minutes later - assuming the browser hasn't
timed out at this point, the human definitely has, boom.

> If that isn't enough, please identify here on list and backed by
> references to the implementation and timings that it is 'too slow',
> 'takes up too much RAM', and results in a 'broken response to the
> client' and why we must break all of our cache structures and
> abstractions.

Hopefully the explanation above makes sense.

I agree with you - changing the abstraction is wrong, I far prefer the
original solution from wednesday last week.

> I'm tired of reviewing large code changes with only vague generalities
> about why the code must change.  These decisions need to be explained
> and reviewed on list before any more code is committed.  Your recent
> commits have been chock full of mistakes and style violations - which
> make it almost impossible to review what's going on.  If you are going
> to commit, please take care to follow all of our style guidelines and
> please ensure the commit works before using our trunk as your personal
> dumping ground.  If your changes are incomplete, feel free to post
> them to the list instead of committing.

The bulk of these patches are a contribution from Niklas Edmundsson who
has put a lot of work into making the cache functional on a high traffic
ftp mirror. These were posted to the list, and added to bugzilla, but this
was not enough to generate any interest or review.

I made the request of Niklas to break the monolithic patch into smaller
reviewable patches, and this was done. I then started reviewing and
committing the patches one by one, starting a few weeks ago.

Repeated requests were made for help reviewing, installing and testing the
patches, but very few reponses were forthcoming, possibly because people
didn't want to get involved in the noisy parts of the thread, or possibly
because they had no time.

Now, a few weeks later, people are suddenly revewing patches committed
weeks ago. I realise that you personally were tied up in infrastructure
issues and so have come late to this discussion, but I have to ask:

Where were the comments when the original patches went in? Of the few
comments that were made, where was the ongoing discussion on these points?

> We're not even close to knowing what we'd be voting on.  So, please
> draft up a proposed design that explains in detail why you think there
> is a problem and what specific changes you propose and why that is the
> best solution.  If you want to submit a patch to go with your
> rationale, cool.  Yet, I certainly don't see any fundamental reason
> why Joe's concerns and mine can't both be addressed in the same
> design.

The current set of patches attempts to solve a list of problems, not a
single problem. The fact that multiple problems were being solved had
muddied the water and made it difficult to review.

I am now trying to address issues one single issue at a time, and the
issue I am trying to resolve here is "solve the out of memory problem when
trying to cache large files".

> I will also note that the mod_cache provider system has explicit
> versioning, so any modifications to the providers should be
> represented with a new version number.  (i.e. providers for version
> "0" should work while offering new features in version "1"-class
> providers.)  We do not arbitrarily tweak the old provider structures
> any more - instead, we introduce new versions.  -- justin

Ok.

Regards,
Graham
--


Reply via email to