Re: [PATCH] bucket type "capabilities" flags

Cliff Woolley 19 Jun 2001 16:19:33 -0000

On Tue, 19 Jun 2001 [EMAIL PROTECTED] wrote:

> What does this provide us?  We are talking about improving performance by
> a single function call, and either way, we have to handle error
> conditions from those functions.  The code is much easier to read and
> understand if we just use the return code to determine that the function
> doesn't exist.  Consider the following:
>
>       if (bucket->flags & APR_BUCKET_FLAG_SETASIDE) {
>             if (apr_bucket_setaside(...) != APR_SUCCESS) {
>                 /* handle error */
>             }
>         }
>
> When compared to:
>
>         if (apr_bucket_setaside(...) != APR_SUCCESS) {
>             /* handle error */
>         }


I'm imagining some situation where you want to make sure ahead of time
that all of the buckets in a brigade support, say, setaside().  If any one
of them doesn't, then that's really bad news, and you want to punt to a
backup algorithm before you even start trying to setaside() any of them.
For a single call, you're right -- there's no need to check ahead of time.
It's only in these "big picture" situations that you care.


> > "metadata" bucket, and whether the bucket has a file descriptor in it for
> > the use by apr_sendfile().
>
> metadata buckets might be useful, but I am having a hard time seeing it.
> Do we really expect more modules to implement their own metadata buckets?
> Does it matter if they do?  By definition, all buckets implement read
> functions.  A metadata bucket is simply a bucket that always returns no
> data from read.  If a module implements their own metadata bucket, then
> that module needs to be the thing to handle that bucket type, so I am
> missing the utility of the metadata flag.

I don't know.  Maybe in one of those situations like the example I gave
above, you want to be sure that all DATA buckets support setaside.  If a
bucket doesn't support setaside and it's a metadata bucket, you could care
less -- just skip over it.  Anyway, I threw this one in because it seemed
like a good idea at the time... maybe it's not that useful.  <shrug>


> > can be sendfile'd ONLY if it matches APR_BUCKET_IS_FILE(), though the
> > above is clearly a case where an APR_BUCKET_SUPPORTS_SENDFILE() test would
> > be much more useful and extensible.  The flags field gives us that
> > ability.
>
> I am severly missing how this file works.  If it can only be sendfile'd,
> then we are basically saying that no files in the file cache can be sent
> through a filter.

That's correct--almost.  In fact, that's the whole problem we're trying to
solve.  Right now, mod_file_cache has to assume that if ANY content filter
is on the stack, then it must DECLINE the request and punt it over to the
default handler because that filter might try to read from its file
descriptor, which it cannot support.  However, there are several pitfalls
in that logic.  The main problem is that there might be a "benign"
content filter out there (something like content_length or byterange,
perhaps) that looks like a content filter but actually is never going to
read the buckets as long as their length is known ahead of time.  So
currently mod_file_cache can never be used on a request with one of those
filters in the chain, even though there's no reason why it wouldn't work.
The second problem is that the check for content filters that
mod_file_cache uses assumes that it can just look at the topmost filter in
the stack; that filter must be a content filter if any filters in the
stack are content filters.  If there are no content filters in the stack,
it assumes it is safe to pass its shared file descriptor down.  But these
assumptions seems very easily broken... the whole thing is just too
fragile.  Using a custom bucket type is much, much more robust and fixes
all of these problems.

The short version of how it works is that mod_file_cache sticks its
apr_file_t into one of these cachedfile buckets and sends it down the
stack.  The cachedfile_create function also takes as parameters the
filename and a pool.  If any filter ever tries to read from the cachedfile
bucket, the cachedfile_read() function reopens the file into the pool that
was passed in at creation-time (namely, r->pool), morphs the bucket into a
real file bucket, and reads from that.  So we're no worse off than if we'd
punted to the default handler in the first place--all it would have done
is open the file into the request pool and create a file bucket to put it
in, which is exactly what we've done--but ONLY if we absolutely have to.
See?

If you want more detail, I'll post the patch to mod_file_cache in a
separate thread.  It's a work in progress at the moment, but there's
enough of it there that you should be able to see what we're trying to
accomplish.


> There is no garauntee that a filter will not try to
> mmap or read from a file.

Hence the need for the smart cachedfile_read() function.

> Support for limiting what can be done with a
> file descriptor, should be a part of the apr_file_t structure.  So, if you
> are opening a file to be shared across multiple threads, then that
> information should be stored in the apr_file_t, and then the read/mmap
> functions should look at the apr_file_t and realize that they can't
> operate on that file.  Once that work is done, I don't understand the need
> for the sendfile flag.

But that would mean that an attempt to read from one of these regular file
buckets with a "special" file handle in it would fail.  That would be very
bad.  The right way to do it is that the bucket should just know how to
handle the situation, which is to give the thread its own private file
handle to the file (by reopening it) if and only if the particular request
really calls for it.  This is very different from the actions of regular
file buckets, so a custom bucket type works quite nicely.  All it has to
do is make sure that it keeps its FD at the same offset as the regular
file bucket does, and watch out for any calls to read().  Sure, this logic
could be stuffed into the regular file bucket, but the regular file bucket
logic is complicated enough as it is.  Special type of data storage,
special bucket type... it's a lot cleaner that way.  Besides, who's to say
that somebody else won't come along later and have yet ANOTHER bucket type
that has a file descriptor in it that they want to sendfile()...


> Can you please post some other situations where these flags are useful.
> In general, I like the idea of having more information in the bucket
> structure, but I want to be sure that when we add this information, we do
> so with very good reasons.

Well, like I said, the main reason I need this is for the sendfile thing.
I'm sure that there are other cases that will arise later on where
somebody (possibly a 3rd party module developer) will want to use a custom
bucket type and will want to have his bucket type get handled by the core
in some special way (eg this sendfile situation).  In any of those cases,
it's better to have a flag that says "this bucket supports this special
function" than to have a hardcoded test for one of the canonical bucket
types which would require hacking the core to get around.  If I can't
convince you of the need for flags that indicate which functions are
implemented (which I hope I already have), then hopefully I can at least
convince you of this.


> > If nobody objects, I'll commit this tomorrow, followed closely by the
> > patch to mod_file_cache and the core_output_filter in Apache.
>
> Please give more time than one day when posting such large changes.  This
> really deserves two or three days so that people who aren't reading e-mail
> everyday can try to keep up.

No problem.  I'll wait until at least tomorrow, maybe Thursday.

--Cliff



--------------------------------------------------------------
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA

Re: [PATCH] bucket type "capabilities" flags

Reply via email to