Talking about placing too many irons in the fire :) Paulo Gaspar wrote:
> The "adaptive caching" idea just arrived too early but I hope it is not > forgotten. You can bet your ass it's not :) [excuse my oxford english] The concept I proposed (more than 6 months ago) about adaptive caching is difficult to understand because it's very different from the usual caching approach. Let me try to explain it: when a request arrives to a resource (might also be a pipeline fragment), you have two choices: ask the cache if the resource is still valid or go right ahead and regenerate it. And then, after it's regenerated, should I save it for further use? if so, where? disk or memory? and if the memory is full, what should I throw away? There is a very simple answer for all of this: do what it gives you the best performance. Period. The real problem is: once I've taken a decision (cache or don't cache) how do I know what would have happened performance-wise if I had taken the other choice? This is the key problem. I propose a simple solution: add 'selection noise' and use incoming requests as sampling events on the system to test. It's a trick: the objective is to reduce the time it takes to handle those requests, but I use them to obtain time information on the system and I superimpose a stocastic sampling to the decision-making process. The problem is that sampling uses user requests, so we must reduce the 'noisy' behavior of these requests: my solution is to make this 'selection-noise' a function of the difference between the two paths. So, if going thru the cache system is, on average, 3 times as fast, I use 1 request over 100. Otherwise, if the cache yields 20% improvement (1.2 times as fast), I sample 1 over 5 and so on. This guarantees that: 1) users don't perceive this since the faster is one path, the less frequent the other path is sampled. 2) the system remains stable and adaptive: if sampling the other path reduces the difference, the frequency of sampling increases, thus ensuring a way to 'steer' the decision making following the actual system performance. Sampling sacrifices a small peak performance for adaptibility, but allows the cache system to transparently adapt even to those cases where caching makes it "slower" than regenerating the resource and avoiding storing it into the cache system (which also has a cost). Another almost magic effect of this system is that we have a direct measure of the efficency of the cache: assuming time-locality of actions, I have a way to measure directly the amount of CPU time the cache system saved. How so? Everytime a request comes, I have the information on the 'estimated' time the resource will need to be generated on the different routes. Once the decision is taken, I have the information on how much it took to get it and I can compare it with the "assumed" time that would have taken on the other path. Then I know how much time I *saved* with this choice. But we don't stop here: we also have a way to measure the efficiency of the cache itself between cache hits and cache misses. This information might be used to estimate the RAM a server would require to obtain near-maximum efficiency. And all this without even touching a single configuration!!! A wonderful example of the advancement of adaptivity on caching is that you have an immediate feedback (with numbers!!!) on how much a change, say, in your database monitoring code, influences caching efficiency. This is a completely innovative approach because the decision whether or not to cache something is estimated a-priori by system administrators, but in complex systems, very few people can make this decision and tune it for every change in the system. Consider this as a hotspot caching machine. Find attached my original message (WARNING: it's more than 20 printed pages!), maybe this time more people will understand it. :) -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche --------------------------------------------------------------------
A System for Adaptive Caching ----------------------------- by Stefano Mazzocchi <[EMAIL PROTECTED]> While there is no reason to say that the Cocoon architecture is inherently slow, the operations performed on the content are generally heavy and time consuming. Enter the content aggregation world and improving performance by eliminating unnecessary operation becomes more and more important. Caching is the act of storing the result of a resource production and avoid further production during the ergodic period of that resource at the request time. The 'ergodic period' of a resource is the time period when the resource doesn't change. In short, when the result of the cache lookup and the resource production are exactly the same. It must be noted that even the most frequently updated resources (i.e. a page including time of the day) has an ergodic period greater than zero. Even a page that shows the time of the day with a second granularity, has an ergodic period of one second. It must also be noted that caching is *NOT* automatically efficient, as we will show later on. Final note: we are discussing resources which are generated using a "cacheable" producer *ONLY*. If the producer is not cacheable caching never takes place. - o - A general caching model for a single cacheable resource production is: +-----(no)------------------> production --------------+ | | -> cache? -(yes)-> valid? -(yes)----------------> lookup ---+--> | | +-----(no)--> production --> storage --+ Thus, given r := resource t := time of the request V(r,t) := validity estimation cost P(r,t) := production cost when caching is not performed Pc(r,t) := production cost when caching is performed l(r,t) := lookup cost s(r,t) := storage cost and I := sequence of invalid requests time [1,n] V := sequence of valid request time [1,m] we have that n -- Invalid(r) = \ (V(r,I ) + Pc(r,I ) + s(r,I )) / i i i -- i = 1 and m -- Valid(r) = \ (V(r,I ) + L(r,I )) / j j -- j = 1 and n m -- -- Ptot(r) = \ P(r,I ) + \ P(r,V ) / i / j -- -- i = 1 j = 1 caching is efficient if and only if Invalid(r) + Valid(r) < Ptot(r) [1] Let's make some considerations: 1) the dimension of 'cost' has not been specified, this means it can include everything that is 'scarce' on the system (memory, cpu, network connections... or even interrupts or multiprocessing degree). 2) all costs are function of the resource *and* time. This takes into consideration the fact that each cost is not generally unrelated to the other costs and to the rest of the system. For example, the storage cost may be higher during load peaks since memory consuption for other resource generation might force the system to use virtual memory or have less CPU time. 3) disequation [1] answers the boolean question "should we cache this resource?". It is evident that the solution of caching question can be determined exactly only when all the necessary information are available. This is the case only when: 1) all requests have already happened 2) it's possible to evaluate the single costs independently Unfortunately, these requirements cannot be satisfied "a priori". This means that it's virtually impossible to obtain optimal efficiency for a physical cache system. Nevertheless, given Ctot(r) = Valid(r) + Invalid(r) the cache efficiency for the given resource is given by eff(r) = Ptot(r) - Ctot(r) which represents a way to determine "a posteriori" the efficiency of the cache system on the particular resource for a particular set of requests. This gives us information on how well caching behaves on a particular resource and can be used to determine whether or not to turn caching off at a certain point in time. - o - The goal of a caching system is to be adaptive toward optimal efficiency and automatically decide whether or not to cache a particular resource based on his previous efficiency history. The first result of the above model is that site administrators cannot decide whether or not a particular resource needs to be cached since they don't have a way to measure the efficiency of the cache on that particular resource: they don't have all the necessary information. So: +----------------------------------------------------------+ | Result #1: | | | | To obtain optimal caching efficiency, the system must be | | totally adaptive on all cacheable resources. | | | | which means (in Cocoon terms) | | | | The sitemap should *NOT* contain caching information | | since the caching concern and discrimintation doesn't | | belong to any individual's concern area. | +----------------------------------------------------------+ - o - There are three possible ways to generate a resource 1) ---> cache? -(yes)-> production ---> 2) ---> cache? -(yes)-> valid? -(no)--> production --> storage --> 3) ---> cache? -(yes)-> valid? -(yes)-> lookup ---> To be able to calculate theorical efficiency, the cost of the three states must be available at all moments. Since all of them generate the same exact resource, it would be foolish to fire up all of them per each request just to have the exact value of the costs and estimate the ergodic period of that resource. At this point, it's not possible to go further unless a few property are assumed on the system. We will concentrate on systems where costs and ergodic periodicity exhibit time locality. This means that the cost of a state is presumed equal to the last known cost for that state and the ergodic periodicity of the resource equals the average ergodic periodicity exhibited until then. NOTE: this is very strong property even if at first appears innocent. It's normal to think at caching being useful under heavy load, thus high request frequencies, thus high time locality for system resources. Since we have shown how request frequency doesn't indicate, alone, the efficiency of a caching scheme, we are reducing us to optimize those systems where time locality is generally valid. Since no other assumption is feasible, we'll go forward assuming time locality from here on, but it must be understood that results from here depend heavily on this property of the system. This said, we have the tools to design an algorithm that maximises caching efficiency. This can be reduced to the creation of the caching discriminator that must indicate whether or not caching should be performed. This discrimination is obviously a function of the caching efficiency for the resource: if the cache has proven efficient, the probability that caching will be performed again is directly proportional to that efficiency. At the same time, the chance of performing caching decreases if the efficiency lowers. Given c(eff) := chance of performing cache given a cache efficency some general properties of the function can be observed: - c: [-inf,+inf] ---> [0,1] - c(0) = 0.5 - lim c(eff) = 1 eff-->+inf - lim c(eff) = 0 eff-->-inf one possible function is c(eff) = 0.5 tanh(k * eff) + 0.5 where tanh(x) := hyperbolic tangent of x k := any positive real number Thus, the discriminating algorithm is: - generate a random real value between [0,1] ---> n - obtain the caching efficiency for the given resource --> eff(r) - calculate the chance of caching ---> c(eff(r)) - perform caching if n < c(eff(r)) what is left to define is how efficiency is calculated: eff = C1 = C2 = C3 = total = cached = 0 while (true) { reset cost evaluation discriminate caching if caching { total++ eff += C1 if (valid) { perform lookup C3 = evaluated cost eff -= C3 cached++ } else { perform production perform storage C2 = evalutated cost eff -= C2 } } else { C1 = evaluated cost eff += C1 f = total / cached eff -= C2 * (1 - f) + C3 * f } } where is evident the use of time locality since Ptot is always incremented by the same quantity even if the cost evaluation for the state 1 cannot be performed during state 2 or 3 and viceversa. - o - It must be noted how the above algorithms are totally abstract and don't rely on any particular behavior of the resource generation rather than cacheability and time locality. The system is totally automatic, adaptive and stable. Stability implies that no matter the past resource behavior in terms of caching efficiency, a change in behavior will *always* influence the caching system. This is implied by the fact that d[c(eff)] --------- != 0 for every value of eff d[eff] so no matter what the efficiency, there will always be a change in caching behavior. It must be observed that the system "converges" toward optimal caching efficiency but may never reach it if the behavioral properties of the resources change faster than the system can adapt. The parameter 'k' in the definition of c(eff) is the only tunable point of the system and indicates the slope of the curve for zero efficiency, thus the speed on how the system converged toward a specific behavior. Higher values of k should be used when resources rarely change behavior over time, thus allowing faster convergence. On the other hand, the system will be slower to adapt to behavioral changes. Lower values of k, indicate the opposite. - o - Ok, I know I've lost many of you at this point and I think the majority of you left will be pretty disappointed in finding out that the above doesn't seem anything about caching as you thought about it before. Let's follow a more historical approach. When I started to design a cache system, I've tried to come up with a rating model for caching: each resource has associated a specific "rate" that normally depends on: - number of request - last requested time - size - production time then a "rating function" crunches these numbers and comes up with an a number associated to this resource. All resources higher than a specific rate were meaningful for caching, the others were not. This approach is admittedly smarter than what Cocoon1 does today, but it's more or less a grown up version of the same concept (yes, Cocoon1 uses a simple rate function to estimate garbage collection of cached resources). The real problem relies on the controllability of the system: it's already hard to come up with a "meaningful" rate function (think about it: it's extremely hard!), the design of an "adaptive" version is even worse. I got lost several times trying to come up with such an 'adaptive rate function' when I realized that I was looking at the wrong place: variables such as "last requested time" weren't primary since 'caching efficiency' is simply 'how much the cache saved us'. Once caching efficiency is defined, the cache control problem becomes a maximization effort of a single variable and can be solved much more easily. The elegance of the caching model here expressed is represented by the fact that it's totally agnostic on the 'cost function', which unlike the 'rate function' must estimate only the 'cost' of generating the resource and not how this will impact the caching efficiency. So, the cost fuction might be as simple as a measure of the elapsed time, yet the cache adapts automatically to find a behavior that converges toward optimal caching efficiency within a short number of request, yet, it's able to adapt it's behavior if the request costs change over time. Moreover, leaving the 'cost estimation' pluggable allows power users to tune the cache for their specific needs, but without having to mess with the fedback system that stabilizes the caching behavior (which is clearly much less obvious to deal with) - o - The model has been so far designed around the production of a single resource. How does the model change when more than one resource is produced? It's easy to show that a caching system with infinite memory doesn't reduce optimal efficiency when multiple resources are handled by the above caching model. Unfortunately, every physical caching system will have a finite amount of memory available for cache storage, this implies an automatic degradation of caching efficiency. At first, we suppose that the system has enough cache memory to hold the state information for each resource. This information includes: - eff resource efficiency - total number of requests that underwent caching - cached number of requests that were effectively cached (cached <= total) - C1 cost of production for no caching - C2 cost of production for invalid caching (C1 < C2) - C3 cost of production for valid caching - length length of cache entry - resource pointer to the cache entry (empty if not present) - previous pointer to the previous resource [eff(current) < eff(previous)] - next pointer to the next resource [eff(next) < eff(current)] The collection of resource state composes a linked list ordered by resource caching efficiency. Given a finite size for caching storage, we define global_cache_efficiency = efficiency_stored / total_efficiency * 100 where efficiency_stored = sum of the positive efficiencies of those resources stored in the cache total_efficiency = sum of the positive efficiencies of all resources. The sum is done over positive efficiencies since negative efficiencies are stored only to provide adaptive information and do not represent the correct behavior. A few properties of the global cache efficiency can be observed: 1) globEff: [0, +inf] -> [0,100] given mem := storage memory for cache 2) lim globEff(mem) = 100 mem -> +inf 3) lim globEff(mem) = 0 mem -> 0 globEff(mem) represents the percentage of the potential cache efficiency that is possible to take advantage of given the storage capacity. This has a tremendous impact for site administrators since it gives an absolute 'metric' to understand how 'userful' is to provide more caching memory to the system. Other types of metrics can be imagined, such as providing cache efficiency per megabyte, etc. but the important result is given by the fact that having *all* the necessary cache information for all resources, we are able to *forecast* what the caching behavior can be if parameters such as storage capacity are changed. It must be noted that such efficiency based forecasting is only an indicator of behavior since increasing the memory capacity is likely to increase the caching efficiency, thus reducing globEff() more than linearly with the amount of memory given. But it still provides a simple to understand metric for system administrators who can finally have a way to understand how useful caching has been. At this point, it is useful to note that eff(r) is the sum of all the cost saved by cache. In the simple case where cost equals elapsed time, eff(r) represents the time saved by caching on that particular resource since the starting of the serving operations. It is possible to design a metric that indicates how much cost per request has been saved on that resource, but this requires the addition of another variable which includes also the count of the total number of requests (both cached and not cached). - o - Another important result of this adaptive caching model is that storage costs don't need to be specified. This means that memory storage can happen in both fast or slower memories and the cache adaptive algorithm will understand automatically if it's still efficient to cache a resource on a slower memory. This shows that cache memory can be added as disk storage when RAM is not available and the cache system will adapt directly, understanding if caching is still efficient or not depending on the storage costs. - o - At this point it's better to sum up a little. So far we have - introduced a general model for caching single resources - designed adaptive algorithms to maximise caching efficiency - showed the impact of limited storage capacity on global efficiency but we have not - defined the 'cost function' - defined the implementation of 'lookup' and 'storing' - defined the implementation of 'ergodic validation' since the general caching model is totally independent on the implementation of these functional blocks, we have separated the concerns, creating an independent adaptive caching system that can work on any implementation. But to be able to build something that works, we must design how these functional blocks behave. - o - The easiest cost function is "elapsed time": this indicates that the most expensive resource is computing power and speed is the most valuable property of functional operation. A pure-Java implementation is possible with a time granularity of milliseconds using System.currentTimeMillis(). Smaller granularities require native hooks either to the system or to the JVM using JNI or the JVM Native Profiling Interface. For most serving environments, millisecond granularity is sufficient. Another possible cost function is "time + memory": this calculates as cost a weighted sum of elapsed time and memory used. This is mainly useful when memory is limited and expensive (embedded systems) so it adds to the cost of the production. The cache system will then adapt to optimize the caching efficiency so that both time and memory are wisely used. This normally results in slower performance, but wiser memory consuption. In general, might even turn caching off for all resource if it's efficient to do so, the beauty of the approach shows here since the administrator can ignore the problems and let the system adapt. A pure-Java implementation of such "time + memory" cost function is almost impossible due to concurrency: there is no way for a particular thread to know how much memory it has used since the JVM gives information only about the system free memory and doesn't allow more granularity than that. It is possible, on the other hand, to provide native hooks to perform such thread-based heap use evaluation using the JVM Native Debugging Interface, even if it might result in slower overall performance depending on the JVM implementation. Of course, some system might require very specific cost functions and for this reason it is suggested to make this pluggable and let each administrator be able to implement their own. The only provided cost function is reasonably 'elapsed time'. - o - So far we have not touched any cocoon-specific concept: the model is general enough to be applied to any caching need, in theory, even to CPU design or digital video decompression. >From now on, the results will be Cocoon specific. - o - The act of store looking up is normally associated to hashing: a resource is mapped to a unique key and this key is used as a hash code into an hash map to lookup the stored entry associated to it. This key must be unique in the sense that two different requests that yields the same key will be treated by the cache system as the exact same resource. While a good candiate for key at first seems the requested URI, this is a poor choice if the produced content depends on other parameters such as user agent, or cookie presence or negotiated content features, etc.. There are two possible solutions: a) include all information available to the producer at request time into the key b) ask the producer to give us the key The first option reduces effort to the user, but it's very inefficient since the information necessary to create the key might be very big (imaging hashing the whole session or context content). The second option enforces IoC and gives the concern back to the owner since the producing logic contains also the information about what parameter influences the uniqueness of a resource. For example, if the production is function of the URI and User-agent I will hash those only and ignore the rest of the enviornment information at request time. +----------------------------------------------------------+ | Result #2: | | | | Each cacheable producer must generate the unique key of | | the resource given all the enviornment information at | | request time | | | | long generateKey(Enviornment e); | +----------------------------------------------------------+ Since this method is called synchronously it must be as fast as possible. 64 bits (long) guarantee uniqueness even for very big sites. - o - Once the key is obtained, the cache state information is queried and caching discrimination is performed using the algorithms described before. If caching is performed and the resource entry is present in the cache, ergodic estimation is performed by calling a method the cacheable producer must implement. +----------------------------------------------------------+ | Result #3: | | | | Each cacheable producer must implement a method that | | evaluates ergodicity given the enviornment information | | | | boolean stillValid(Enviornment e); | +----------------------------------------------------------+ This implements synchronous ergodic validation, but in some cases the producer is able to estimate the ergodic period in advance, allowing the cache system to avoid this call. +----------------------------------------------------------+ | Result #4: | | | | Each cacheable producer must implement a method that | | estimates the ergodicity period given the enviornment | | information | | | | long timeToLive(Enviornment e); | | | | zero indicates the producer is not able to perform | | the estimation and synchronous ergodic validation must | | performed. | +----------------------------------------------------------+ The algorithm for performing ergodic validation is (in pseudo-code): if (resource is cached) { if (current time < expiration time) { get resource from cache } else { get time2live from producer if (time2live != 0) { set expiration time = current time + time2live produce resource and store it } else { if (resource is still valid) { get resource from cache } else { produce resource and store it } } } } else { produce resource store it } which is totally abstracted on the implementation of the actions - "get resource from cache" - "produce resource and store it" - o - Assuming that memory represents an ordered collection of randomly accessible bytes, the act of 'storing a resource' and 'getting a resource' imply the act of 'serialization' and 'deserialization' of the resource. In many cases these operations are implicit, or done by the system (think about Java object serialization capabilities). For Cocoon2, the resources are represented as a stream of SAX events and the operation of serialization/deserialization must be carefully designed. A possible choice for such serialization/deserialization action is the XML compilation method I designed a while back. One objection on the use of this method for caching was it's specific asymmetry: SAX serialization (compilation) is slightly slower deserialization (interpretation). After careful thinking, I believe that: 1) serialization/deserialization should be pluggable 2) the cache control system will adapt on the latency and automatically adjust the caching efficiency when the SAX serialization/deserialization code is not suited for that resource. This said, SAX compilation can be made faster by removing compression at the expense of using more cache memory. It must be noted that global caching efficiency can also be used to measure the impact of SAX compression on caching. This gives another degree of freedom for the cache administrator to tune the system since increasing SAX compression during serialization might slow down the resource generation, but might save cache memory that might be used to increase the efficiency of other resources. The impact of de/serialization costs on resources cannot be judged independently from the statistical properties of the system, just like all other costs. - o - So far we have analyzed a system where resources where produced by a single actor, now we concentrate on understanding the impact on what already designed for resources that are generated from a pipeline. Let us suppose we have a producer G ---> T1 ---> T2 ---> where the components are G := generator T1 := first transformer T2 := second transformer ---> := SAX stream The question is: is it possible to perform ergodic validation on the resulting producer given the ergodic information for the single components? Before answering the question, let us observe that given a transformer a ---> T ---> b the result of the production b is can be written as b = T(a(t),t) assuming that the behavior of the function doesn't change with time (this would require code recompilation!) and separating the changeable parts, the above can be simplified in a ---> T ---> b ^ | c b = T(a(t),c(a(t),t)) performing an ergodic validation on such a general 'T' requires the presence of both a(t) and c(t) and normally results in an operation that is algorithmically equivalent to that of generate the resource b(t), it would then be neaningless to perform since would make caching efficiency automatically negative. Let us note that, if dc/da = 0, c(t) doesn't depend on a(t) so we can write b = T(a(t),c(t)) this means that, ergodic validation can be performed on c(t) only and whether c(t) is ergodic, b(t) has the same ergodic properties of a(t). The importance of this must not be underestimated: in a complex pipeline, the ability to "atomize" ergodic validation is vital for the efficiency of the cache. When such a separation is not possible, caching becomes almost automatically inefficient since the cost of performing ergodic validation is confrontable with that of generating the request anew. +----------------------------------------------------------+ | Result #5: | | | | Ergodic validation of a pipeline can be atomized if and | | only if the behavior of the component c(t) does not | | depend on the input a(t) | +----------------------------------------------------------+ Let's make a few examples: - for XSLT transformation, a(t) represents the document to transform and c(t) represents the stylesheet, thus the instructions to the T() function which is the XSLT behavior that never changes during execution. Following the rule, the validation is atomic when the stylesheet doesn't depend on the input. This is true only if no <?xml-stylesheet?> PI is present. +----------------------------------------------------------+ | Result #6: | | | | The use of PI to drive the operation of a pipeline | | breaks, along with SoC, pipeline atomiticy and thus must | | considered harmful. | +----------------------------------------------------------+ - for XInclude transformation, c(t) is *always* and by definition function of a(t) since it provides the information that c(t) will use to get the external information to include. XInclusion performed as a transformer can never be atomized. +----------------------------------------------------------+ | Result #6: | | | | All those transformers that use part of their input as | | as instructions for their operations cannot be atomized. | +----------------------------------------------------------+ This gives us a metric to understand why it's preferable to move those operations at generation rather than provide XML parameters (This is the case with the SQLProcessor and LDAPProcessor found in Cocoon1). It must be noted that lack of atomicity doesn't necessarely imply "bad performance" of the pipeline, but rather decreased optimal cache efficiency. The difference must not be misunderstood. - o - Let's come back to our pipeline, assuming all our components exhibit ergodic atomicity, the algorithm to drive the general pipeline: G -(r[1])-> T[1] -(r[2])-> ... -(r[n])-> T[n] -(r[n+1])-> is if (G is not valid) { start from G } else { for (i = 1; i <= n; i++) { if (T[i] is not valid) { for (j = i; j > 0; j--) { if (r[j] is cached) { send r[j] thru T[j] stop } } } } r[n+1] is valid } - o - We have seen how pipeline caching performs optimally only if the components are atomic and we showed that XInclusion is not. Since we indicated how XInclusion is vital for content aggregation, we now try to evaluate the ability to perform XInclusion without breaking pipeline atomicity. The first think to observe is that XInclude is, by nature, intrinsically tied to XML, just like namespaces. In the context of XML publishing, it's very limiting to consider one without the other. This seems to imply that XInclusion should be a feature of the system and not a detachable component. Let us suppose we included XInclude capabilities right into the SAX pipe that transports SAX events from one component to the next. This "pipe connector" must be present because it's the actor tha performs SAX serialization and deserialization when the resource is stored in or retrieved from cache. The regular 'connector' could be drawn as such T ------\-/-----> T | ^ | | C D | | V | cache where T := transformer C := event compiler D := event decompiler by placing internal xinclude capabilities we transform it into something like T ------\-/---[X]--> T | ^ ^ | | | C D v | | world v | cache where world := all possible included resources. Unfortunately, something like this would be totally equivalent of having an XInclude transformer, thus atomicity will be broken again. The solution is making the XInclude component cache aware (unlike all other components which are yes cacheable, but totally cache agnostic and cannot call the cache directly to enforce IoC). So T ------\-/---[X]--> T | ^ ^ | | | C D | | | | v | v +----------+ | cache | +----------+ It's good to note that xincluded resources can be - internal: equivalent of subpipeline mounting - external: equivalent of server side inclusion For internal resources, the cache is able to perform subrequesting thru the sitemap and caching the internal resource is a feature we get for free. On the other hand, for external resource, the cache doesn't have a "cacheable" actor to call to estimate ergodicity. I see two possible solutions: - wrap all external resources with a generator, thus make all of them internal - assign a cacheable actor to each external resource based on the protocol The first is more uniform and elegant, but requires to create sitemap entries for each included resource (which is a rather good practice, IMO, even if places more work on the sitemap administration) The second is more 'hacky', but doesn't need the definition of a sitemap entry for included resources. Since I believe that the sitemap should contain information about all the resources, even those included to allow the administrators to control what's going on, the first solutions seems the most elegant so +----------------------------------------------------------+ | Result #7: | | | | Xinclusion works on local URI since all the external | | resources must be wrapped by a caching actor and defined | | in the sitemap. | +----------------------------------------------------------+ - o - Now that we have found a way to allow xinclusion and maintain component ergodic atomicity, let's see how the caching algorithm is modified. Given G2 -(r1.1)-+ | v G1 --(r1)--+-> T -(r2)-> we immediately note that since (r1) includes the information about (r1.1), the resource cache state must also contain data about what resources are included. This information must be updated during xinclusion time by the XInclude cache-aware filter during production. So, if (r1) is valid because G1 says so, the cache information of (r1) is retrieved from cache, and all the included resources are validated. If all of them are valid and T is valid, then (r2) is valid. Otherwise, even just one of the included resources is invalid, (r1) is passed again into the xinclude filter where all the cached resources will be included right out of cache, while those invalid ones will be regenerated. +----------------------------------------------------------+ | Result #8: | | | | Each resource must have associated some cache | | information that contains the resources that are | | included. | +----------------------------------------------------------+ - o - So far we have treated the pipelines as they were composed only by generators and transformers. In short, each pipeline can be seen as a SAX producer. This is a slight difference from the original C2 term of pipeline that included a serializer as well, but it has been shown how the addition of xinclusion requires the creation of two different terms to define pipelines that are used only internally for inclusion or those pipelines that "get out" and must therefore be serialized. I have the feeling that this requires some sitemap semantics modification, allowing administrators to clearly separate those resources who are visible from the outside (thus require a serializer) and those who are internal only (thus don't require a serializer). In general, some resource can be used both internally and externally. This can be done by wrapping an internal resource with a serializer and present it to the external world under a different URI. The only thing that is left to design is how serialized content is cached. Generally speaking, an externally-visible pipeline such as G ---> T ---> S ---> could be cached just like an internal one as long as no SAX compilation takes place after the serializer. The algorithms for ergodicity validation remain the same. But serializers have particular behavioral properties that allow us to obtain a general result. Unlike transformers, serializers have a functional behavior of a ---> S ---> b where b = S(a(t),t) but unlike transformers, serializers must not have external instructions but rather be code implementation of 'adapting' functions between the XML world and the outside octet-stream-based world. Thus, since time dependency here means code recompilation and all the instructions are contained into code we can assume that b = S(a(t)) where b maintains the exact same ergodic properties of a. This means that caching both a and b is useless since if b is valid so is a, and if a is invalid both a and b must be invalidated. This shows that the cache entry for 'a' is *never* used! It is a general rule to say that for a complete pipeline G -r[1]-> T[1] -r[2]-> ... -r[n]-> T[n] -r[n+1]-> S -r[n+2]-> r[n+1] must be stored if and only if the pipeline (G,T1...Tn) is used as an internal included resource, otherwise it's optimally efficient to discard it alltogether. - o - We have reached the point where the Cocoon2 cache system is complete in functionality, content aggregation aware, totally adaptive on memory consumption and efficienty maximization. A final note must be made: the request that is most efficiently processed is the one that is not processed at all! The HTTP/1.1 specification [RFC 2616] makes a great effort to design a way to allow a distributed information system to improve performance thru the use of caching. At this point, it must be noted that even if both this note and the HTTP spec makes use of the term caching, they mean different things: we refer to caching of internal resources while HTTP caching works on transparently performe serving on behalf of another server, thus reducing load on the server. Unfortunately, total transparency cannot be performed because the proxy and the server don't share the same concerns and don't have the same information on the requested resource. In fact, only the server is able to create the resource: the proxy can only decide whether or not the local copy that it might have previously store is 'meaningful' enough for the client to avoid making the server request and returning the local copy. This implies that a good proxying behavior is dependent on server's 'proxy-friendlyness', which assumes that the server provides the proxy enough information to perform its operation well. Otherwise (and unfortunately, this is almost always the case!!) proxies simply work as gateways that are rarely able to cache information. So, let's look at the behavior that HTTP/1.1 reccomends for proxy-friendly servers by starting to note that, just like our caching model, there are two ways of performing ergodic estimation: - asynchronous, thru the use of expiration time or maximum age - synchronous, thru the use of validation-only requests Being friendly for asynchronous operation means simply to provide maximum age information with the header Cache-Control: max-age=<delta seconds> whenever possible. For Cocoon2 operation, this means that this header will be automatically sent if and only if all resources that generate the response are cacheable and provide a maximum age which is higher than zero. Of course, the max-age will be the minimum of the many maximum ages of the resources that form the response. It must be noted that normal operations like XSLT transformation cannot provide a maximum age because there is no information on when the stylesheet can be changed. On the other hand, it's not normally harmful to have old stylesheets, so it's up to the administrator to tune the caching system for their needs. HTTP/1.1 also provides a mean for validating a resource that is cached in the proxy, much like our caching system does when synchronously calls all cacheable components. This method works by sending a normal request to the server, along with some special "validator" headers that were previously sent by the server during the response generation. The server can use these 'validator' headers to gain knowledge about the entry that is located in the proxy and perform local processing to estimate the validation of that particular cached response. If the server rules estimate that the proxied response is still valid, it returns an empty response with a 304 (Not Modified) status code, otherwise it returns the full response along with the updated 'validator' headers. [please, read section 13 of the RFC 2616 for more info] The validator header that is normally used is "Last-Modified" but any request header or combination of headers can be used to gain information on the validity of the request. It is thus possible, when a request comes and the cocoon cache system returns that the stored request is still valid, to return a 304 instead of sending the full body (even if from cache). This reduced network congestion and increases caching efficiency since the expense of sending the bytes to the client (even if cached) can be avoided. As a general consequence, this implies that storing the result of serialization can always be avoided if Cocoon is wrapped by an HTTP/1.1 proxy that is able to perform response validation. It is possible to obtain information about 'proxying' using the 'Via:' HTTP header that (should!) indicate the proxy chain. Unfortunately, there is no way to automatically tell if Cocoon is completely wrapped by a proxy or if proxying is performed down the road. It is therefore advisable that Cocoon implement a caching behavior for serialized responses depending on the presence of a proxy wrapper (storing or not the serialized response) which can be indicated only by using a configuration directive. Note: Apache's mod_proxy distributed with the 1.3.xx series implements a HTTP/1.0 proxy and for this reason it's not able to perform response validation as described here. On the other hand, Squid is partially compliant (can any proxy pretend serious HTTP/1.1 compliance today?) and might be able to perform such proxying (sorry, haven't tested it yet!). Of course, proxying can be cascaded, so a local transparent proxy doesn't impact on the performance of remote proxies and doesn't impact on proxy-friendlyness at all.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]