Re: NewCache - a requirements spec v0.01

Chuck Murcko 23 Feb 2001 11:49:41 -0000

Sweet. Elegant, extensible, and efficient by virtue of placing the cache out 
generator ahead of the regular request processing.


This is interesting: what happens if you replace the browser in your diagrams 
with another Apache + cache? It seems like a reasonable way to do cache 
synchronization amongst a server cluster, or an akamai-type distributed cache. 
The caches could transfer compressed objects between themselves, so compression 
would happen only once per object no matter how many caches are maintaining 
sync.

It also seems that the proxy function fits nicely into this scheme:

              +----------------------------------------+
              |         Browser|Server|Proxy           |
              +----------------------------------------+
                  |                ^             ^  ^
                  |                |             |  |
                  v                | Y           |  |
           +-----------+  Y      +-----------+   |  |
           | Cache Out |-------->| Cache Out |   |  +-----+
           | in cache? |         | fresh?    |   |        |
           +-----------+         +-----------+   |    +----------+
                  | N                      | N   |    | Cache In |
                  | +-------------------+  |     |    | serve    |
                  +-| Cache Out         |<-+     |    | from     |
                  | | force conditional |        |    | cache    |
                  | +-------------------+        |    +----------+ 
                  |                              |        |
                  v                              |        |
      +-----------------------+                  |        |
      | Proxy protocol select | Y                |        |
      |        filter         | --+              |        |
      |     Proxy request?    |   |              |        |
      +-----------------------+   |              |        |
                N |               |              |        |
                  |               v              |        |
                  |       +---------------+      |        |
                  |       |  Per-protocol |      |        |
                  |       |    content    |      |        |
                  |       |   generator   |      |        |
                  |       +---------------+      |        |
                  |               |              |        |
                  v               |            N |      Y |
           +-------------+        v     +---------------------+
           | Apache 2.0  |------------->| Cache In            |
           +-------------+              | force conditional & |
                                        | 304 Not Modified?   | 
                                        +---------------------+

A proxy protocol selector (really a glorified multiplexer, similar in function 
to the core logic in the current mod_proxy.c, but built as a filter) determines 
if the incoming request is a proxy request for a supported protocol. If it is 
not, it simply passes the request to Apache (actually the rest of the httpd 
processing chain). If it is a request for a supported protocol, the selector 
chains in a content generator appropriate for that protocol which fetches the 
object and hands it off to Cache In.

This seems to handle modular protocol support in the proxy cleanly, and does 
not appear to break the 304 Not modified processing of the cache. The HTTP 
content generator will also have to deal with content negotiation.

This also seems to be able to work with encrypting/decrypting SSL/TLS proxy 
either by adding the appropriate content generators (brute force approach), or 
by the protocol selector building a two stage decrypt-then-HTTP or 
HTTP-then-encrypt subchain (better code reuse).
Question for Ryan or other httpd mavens: should the protocol content generator 
feed the rest of the httpd processing chain, instead of Cache In directly? If 
so, the HTTP-then-encrypt subchain would split into HTTP-then-<other httpd 
processing>-then encrypt. 

I've inserted a small correction in your original text at the end.

Nice work!

Chuck

On Thursday, February 22, 2001, at 09:23 PM, Graham Leggett wrote:

> Hi all, 
>  
> This is a preliminary discussion about a proposed caching module in 
> Apache v2.0. It's a sort of a requirements specification, if you will. 
>  
> The design is based entirely on proxy caching described in RFC2616, and 
> is rather tricky - as a result I've tried to describe things very 
> simplistically at the beginning, and then layering each new piece of 
> complexity so that the big picture is not overwhelming. 
>  
> === 
>  
> mod_cache 
> ========= 
>  
> Requirements 
> ------------ 
>  
> The purpose of any cache is to make the transfer of information through 
> or from a system more efficient. A cache is a tradeoff between a number 
> of attributes, in our case the tradeoffs are: 
>  
> - Bandwidth conservation - We want to transfer as few bytes over the 
> network as possible. 
>  
> - CPU cycle conservation - We want our webservers to do as little 
> crunching as is possible. Less crunching means less computing 
> horsepower, and thus a smaller and faster server. 
>  
> - Memory - We cache data in memory - memory is traded off for 
> performance above. 
>  
> - Disk - We cache data to disk - disk space is traded off for 
> performance. 
>  
> - Caching everything - We cache all data, from static data on disk, to 
> dynamically generated data, to data pulled from another server through a 
> reverse proxy. 
>  
> - We use the control techniques described in RFC2616 as a "public 
> cache". 
>  
> - THE DESIGN MUST BE EASY TO FOLLOW AND UNDERSTAND. 
>  
>  
> Caching - The Simple View 
> ------------------------- 
>  
> There are two tasks a cache module must perform at a basic level: 
>  
> - Place new cached data into the cache 
> - Serve cached data from the cache 
>  
> These two functions are handled by two separate halves of the cache: A 
> content generator "Cache Out", and a filter "Cache In": 
>  
>  
>                 +-------------------------+ 
>                 |         Browser         | 
>                 +-------------------------+ 
>                     |              ^  ^ 
>                     |              |  | 
>                     v              |  | 
>              +-----------+   Y     |  | 
>              | Cache Out |---------+  | 
>              +-----------+            | 
>                     |                 | 
>                     | N          +----------+ 
>                     |            | Cache In | 
>                     |            +----------+ 
>                     v                 ^ 
>              +-------------+          | 
>              |    Apache   |----------+ 
>              +-------------+  
>  
> Very simplistically described, a request from a webbrowser is first 
> intercepted by the "Cache Out" content generator. If the request is 
> cached, the cached data is returned and the request ends immediately. If 
> not, the content generator does nothing and the rest of Apache is 
> responsible for generating the content. 
>  
> At the other end, the "Cache In" filter is responsible for putting 
> content generated by Apache into the cache. This module directs data 
> either to memory or to disk (or a combination of both) depending on the 
> configuration of the cache. 
>  
>  
> Caching - The Slightly More Complicated View 
> -------------------------------------------- 
>  
> Of course, caching isn't actually this easy. Some complications set in 
> when we note that data is not only either "inside" or "not inside" the 
> cache, but also of varying freshness as well. 
>  
> RFC2616 describes mechanisms for specifying how long an item in the 
> cache can remain fresh. When a cached entity expires and is no longer 
> fresh, we do not simply discard the cached data - instead the "Cache 
> Out" content generator modifies the browser request slightly to change 
> the request to a conditional request and hand the browser request down 
> to the rest of Apache. 
>  
> The "Cache In" filter looks at the result of this conditional request. 
> If the result is "304 Not Modified", then the "Cache In" filter fulfils 
> the request from the cache just as the "Cache Out" content generator 
> would have at the start.  
>  
> If the result is not "304 Not Modified" it means there will be new data 
> on the way. The "Cache In" filter places the data in the cache as normal 
> replacing whatever was there before, and the data is passed to the 
> browser as normal. 
>  
>  
>               +----------------------------------------+ 
>               |         Browser                        | 
>               +----------------------------------------+ 
>                   |                ^             ^  ^ 
>                   |                |             |  | 
>                   v                | Y           |  | 
>            +-----------+  Y      +-----------+   |  | 
>            | Cache Out |-------->| Cache Out |   |  +-----+ 
>            | in cache? |         | fresh?    |   |        | 
>            +-----------+         +-----------+   |    +----------+ 
>                   | N                      | N   |    | Cache In | 
>                   | +-------------------+  |     |    | serve    | 
>                   +-| Cache Out         |<-+     |    | from     | 
>                   | | force conditional |        |    | cache    | 
>                   | +-------------------+        |    +----------+  
>                   |                              |        | 
>                   v                            N |      Y | 
>            +-------------+              +---------------------+ 
>            |    Apache   |--------------| Cache In            | 
>            +-------------+              | force conditional & | 
>                                         | 304 Not Modified?   |  
>                                         +---------------------+ 
>  
> In addition to the above RFC2616 also defines ways to determine whether 
> an object is cachable or not. Depending on the value of the 
> Cache-Control (and possibly other) headers, the "Cache In" and "Cache 
> Out" modules decide whether an object is cacheable at all. If not, these 
> modules take action to tell the "Storage Manager" (coming soon) to 
> delete the objects from the cache if necessary. 
>    
>  
> Caching - The Plot Thickens 
> --------------------------- 
>  
> Yes, it gets even more complicated, but not really. 
>  
> HTTP/1.1 (RFC2616) supports content negotiation. In a nutshell this 
> means that a single URL can have a number of representations: The 
> language might be different, or the data might have a special content 
> encoding, or it might be compressed. This means that different browsers 
> can get different data in response to the same request for the same URL. 
> The cache needs to handle this in an intelligent fashion. 
>  
> To do this, we break down the cache code again and introduce a new bit: 
>  
> - "Cache Out" - The content generator 
> - "Cache In" - the filter 
> - "Storage Manager" - the bit that handles the actual storing of the 
> data, either on disk or in RAM. 
>  
> To keep the cache code simple we say that the "Cache Out" and "Cache In" 
> modules have no knowledge whatsoever of content negotiation. All they do 
> is give the URL and the request headers to the "Storage Manager", and 
> using the combination of URL and request headers the "Storage Manager" 
> makes the decision as to whether an object is cached or not, or whether 
> an object should be replaced.  
>  
> So, we could see four (or more) different objects in the cache for the 
> same URL, each with their own independantly defined freshness, and each 
> treated entirely separately from the other: 
>  
>  
>                                                  +------------+ 
>                                            +-----| Normal     | 
>                           +---------+      |     +------------+ 
>                   +------>| English |------+ 
>                   |       +---------+      |     +------------+ 
>                   |                        +-----| Compressed | 
>    +-------+      |                              +------------+ 
>    |  URL  |------+ 
>    +-------+      |                              +------------+ 
>                   |                        +-----| Normal     | 
>                   |       +---------+      |     +------------+ 
>                   +------>| French  |------+ 
>                           +---------+      |     +------------+ 
>                                            +-----| Compressed | 
>                                                  +------------+ 
>  
>  
> The "Storage Manager" is a modular design - add on modules allow you to 
> cache to shared memory, or disk, or to other cache storage mechanisms 
> still to be invented. 
>  
>  
> Caching - The Complicated Bit 
> ----------------------------- 
>  
> Just when you thought that was it! 
>  
> It has been pointed out that storing both compressed and uncompressed 
> versions of the same object representation in the cache is a waste of 
> resources. Although the cache tries very hard to remain transparent to 
> the content that is being cached, there are some optimisations that can 
> be made to speed up the process. The best place for this to happen is in 
> an "Optimisation Layer" sandwiched between the "Cache In" and "Cache 
> Out" modules, and the "Storage Manager". 
>  
>  
>    +-----------+ 
>    | Cache Out |-----+ 
>    +-----------+     |    +--------------------+    +-----------------+ 
>                      +--->| Optimisation Layer |--->| Storage Manager | 
>    +-----------+     |    +--------------------+    +-----------------+ 
>    | Cache In  |-----+ 
>    +-----------+ 
>  
> The optimisation layer is designed to perform some optimations on the 
> data going into and out of the cache. 
>  
> Some optimisations include: 
>  
> - Compression: 
>  
> If uncompressed data is being put into the "Storage Manager", the 
> "Optimisation Layer" compresses the data before putting it in the cache. 
>  
> If uncompressed data is requested from the "Storage Manager", the

The above should be "If compressed data..."

> "Optimisation Layer" will uncompress the data on the fly before passing 
> it on back to either the "Cache In" or "Cache Out" modules. 
>  
> In both of these cases, neither the "Cache In", "Cache Out" nor "Storage 
> Manager" modules need worry about these optimisations. 
>  
> These optimisations also need not depend at all on other modules in 
> Apache, such as mod-gzip. 
>  


Chuck Murcko
Topsail Group
http://www.topsail.org/

Re: NewCache - a requirements spec v0.01

Reply via email to