Re: directory_walk performance

dean gaudet Mon, 25 Jun 2001 12:25:40 -0700
On Mon, 25 Jun 2001, William A. Rowe, Jr. wrote:

> One consideration, depending on module authorship, is if a module
> author chooses to 'do something' in the directory merge operation,
> relative to the request (uri, query elements, etc.)  Does anyone know
> of such a module?

the merge functions have arguments (pool *, void *, void *) ... there's no
knowledge of the request.  this is pretty much deliberate... since they
need to be invoked during config time when no request exists.

...

the cache really should be of request URI -> merged request info (and more
generaly of the entire request header -> filled in request_rec).  i
started to go down this path when i was implementing the "flow cache".
it's very similar to what modern routers / packet filters do so that they
can avoid doing O(N) work on each packet.  (they cache patterns which
match packets which are allowed, and generally include the routing
information in the cache, so that the decision can be made in special
purpose hardware.  a central CPU only needs to decide what goes in the
cache -- the CPU implements the full O(N) search over the configuration.)

this is one of the many reasons that architectures like TUX make so much
sense.  (you know how enamoured i am of the generality of the apache
config language...)

below is the README from
<http://arctic.org/~dean/apache/2.0/flow-00.tar.gz>.

-dean

HTTP FLOWS

A "flow" is a pattern that maps a request to a response.  At a minimum
the pattern includes a URL.  More complex patterns are possible, such
as patterns involving the various "Accept" headers, a Cookie header,
or an Authorization header.

Pattern matching is very simple -- only exact matches are supported.

Patterns can be broken into three categories (possibly more):

    - object -- mapping URLs to filenames
    - negotiation -- distinguishing multiple mappings for the same URL
    - authorization -- permitting or forbidding a request based on the
        presence or absence of headers

In the interest of efficiency authorization patterns will be
shared between multiple flows.  It is easiest to think of this as an
implementation of the <Directory> containers.  If an entire tree /private
is protected by Basic auth, then there will be an authorization pattern
corresponding to /private.  Conceptually the pattern is essentially
a list of permitted Authorization headers, and request containing an
Authorization header in the list is permitted.

It is not feasible in general to pre-calculate all the flows for
a website.  In fact that's not interesting.  What is interesting is
dynamic calculation of flows.  Let's call a "flow cache" a set of flows
that describe a subset of the responses for a website.  If a flow is
in the cache then we know for certain what to respond with, and we can
immediately send the response without further processing.  If a flow
isn't in the cache then we need to perform more expensive processing,
and while doing that we may add a flow to the cache to handle future
requests matching similar patterns.

Consider this httpd.conf fragment:

    DocumentRoot /www/htdocs

    <Directory /www/htdocs/private>
        BrowserMatch Mozilla/2 go_away
        order allow,deny
        allow from all
        deny from env=go_away
    </Directory>

The directory container blocks access by User-Agent's containing
"Mozilla/2".  (Or something like that, can't be bothered to look up
the syntax.)

Initially the flow cache is empty.

Suppose we get a request for "/".  The flow processor can't find any
match, so it punts.  The full request processing is performed -- this is
pretty much the same as the current Apache API.  During this process we
calculate that the url "/" maps to the file "/www/htdocs/index.html",
and there are no alternatives to negotiate, and no authorization
to restrict access.  So we insert a flow with the pattern "/ ->
/www/htdocs/index.html" into the flow cache.

Actually we wouldn't quite insert that.  If we're running on WIN32
we'd insert a pattern "/" -> file_handle, where file_handle is an open
read-only handle for /www/htdocs/index.html.  This way we could call
TransmitFile directly.  Similarly on unix we'd use mmap().  Or maybe we'd
actually map it to a database object cached in memory... it's essentially
a mapping to a handler and an object.  Also as part of this map we insert
cached content-length, last-modified, and content-type values used to
build the response headers.

Similar when we get requests for "/logo.gif", or "/warez.txt" we can
insert them into the cache.

When a request comes for "/private/" we have to build an authentication
pattern.  An auth pattern is a list of headers and the values they must
match.  Suppose this first request for "/private/" comes from a mozilla/2
client.  We build an auth pattern for <Directory /www/htdocs/private>,
indicating that User-Agent must match one of {} ... must match an empty
list.  i.e. it will never match.  We will also build the "/private/"
-> /www/htdocs/private/index.html mapping, and it will refer to the
<Directory /www/htdocs/private> auth pattern.

If we then get a request for "/private/" from a "Lynx/2.6 libwww-FM/2.14"
client, it won't match in the flow cache.  But we will permit
it during the full request processing... and then we can insert
"Lynx/2.6 libwww-FM/2.14" into the User-Agent list for <Directory
/www/htdocs/private>.  Subsequent requests from that same user agent
to /private/ will match in the flow-cache and be served without full
processing.

We can continue this way, adding urls such as "/private/part.gif",
and "/private/memo.html".  They all share the same auth pattern, which
means they share updates to the list of permitted User-Agents.

Similar techniques can be used to build negotiation patterns.  We take
the simple approach in every case.  Rather than cache complex expressions
(such as the list of all negotiation possibilities for an object, with
q-values and whatever else), we cache the exact set of headers which are
acceptable and match them.  Anything which doesn't match goes through
the full processing (and that will insert a record to help match the
next time).  Similar to auth patterns, some effort should be put into
"coalescing" negotiation patterns.

For example, Navigator sends the same Accept/Accept-Language headers
on almost every request (the exact contents can be modified by the
user...).  We only need to recognize that set of headers, and then
any server negotiation involving those headers can refer to the
common pattern.

It's my feeling that a large percentage of requests fit into the above
mold.  There's some obvious extensions too, such as negative caching --
for example, a pattern denying an IP address could be a map to an error
handler and an error code.

WHY WHY WHY?

Why do we want to do all this?  Speed.  The prototype implementation
here is 40% to 50% faster than apache-nspr... this was with a hardwired
test across loopback, but without keep-alive.

Here's two reasons I can think of for the speed increase:

- Simplistic hand coded parser.  The parser doesn't handle all cases,
  it shouldn't have to.  It bails to the full request parser in any
  case it can't handle.  In fact it's so simple that it doesn't
  even have buffered i/o -- if the request isn't in the first packet
  sent from the client it bails and passes it down to full request
  parsing.

- Fewer threads.  The prototype has two threads.  One accepts
  connections, reads requests, and parses them.  The other handles
  sending responses.  Fewer threads means fewer context switches.
  But more importantly it means less memory required for things such
  as stacks... and that results in less L1/L2 cache impact... which
  can have a dramatic effect on performance.

CAVEATS

The prototype has a pre-seeded flow cache.  In practice there are
locking issues involved with seeding the flow cache.  Using a message
queue will allow us to batch updates and hopefully incur less locking
overhead.

The prototype and the discussion above don't deal with cache coherency.
This is an area I'm still working on.  A rough first approximation is
to flush the entire cache periodically.  There's a bunch of other
options... each pattern has different criteria for coherency.

Obviously doing this will require API changes, and configuration language
changes.  The above example would be extremely difficult to "compile"
into the form required for the flow cache.  But I think we can make a
user-friendly language that is easily compilable into flows on the fly.
Re: directory_walk performance

Reply via email to