Re: mod_proxy distinguish cookies?
FYI: I recently had a long exchange with Microsoft's support regarding the Vary header, and the outcome was that they have at least *documented* their RFC2616 compliance issue: http://support.microsoft.com/default.aspx?scid=kb;en-us;824847 Best regards, Julian -- green/bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760
Re: mod_proxy distinguish cookies?
Roy T. Fielding wrote: I do wish people would read the specification to refresh their memory before summarizing. RFC 2616 doesn't say anything about cookies -- it doesn't have to because there are already several mechanisms for marking a request or response as varying. In this case Vary: Cookie added to the response by the server module (the only component capable of knowing how the resource varies) is sufficient for caching clients that are compliant with HTTP/1.1. Graham wrote... My sentence "RFC2616 does not consider a request with a different cookie a different variant" should have read "RFC2616 does not recognise cookies specifically at all, as they are just another header". I did not think of the Vary case, sorry for the confusion. Regards, Graham "Vary" still won't work for the original caller's scenario. Few people know this but Microsoft Internet Explorer and other major browsers only PRETEND to support "Vary:". In MSIE's case... there is only 1 value that you can use with "Vary:" that will cause MSIE to make any attempt at all to cache the response and/or deal with a refresh later. That value is "User-Agent". MSIE treats all other "Vary:" header values as if it received "Vary: *" and will REFUSE to cache that response at all. This means that if you try and use "Vary:" for anything other than "User-Agent" then the browser is going to not cache anything (ever) and will be hammering away at the unlucky nearest target ProxyCache and/or Content Server. Why in the world an end-point User-Agent would only be interested in doing a "Vary:" on its own name ( which it already knows ) ceases to be a mystery if you read the following link. The HACK that Microsoft added actually originated as a problem report to the Apache Group itself back in 1999... URI title: Client bug: IE 4.0 breaks with "Vary" header. http://bugs.apache.org/index.cgi/full/4118 Microsfot reacted to the problem with a simple HACK that just looks for "User-Agent" and this fixed 4.0. That simple hack is the only "Vary:" support MSIE really has to this day. The following message thread at W3C.ORG itself proves that the "Vary:" problem still exists with MSIE 6.0 ( and other major browsers )... http://lists.w3.org/Archives/Public/ietf-http-wg/2002AprJun/0046.html There is also a lengthy discussion about why "Vary:" is a nightmare on the client side at the mod_gzip forum. The discussion centers on the fact that major browsers will refuse to cache responses locally that have "Vary: Accept-encoding" and will end up hammering Content Servers but the discussion expanded when it was discovered that most browsers won't do "Vary:" at all. http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html As far as this fellow's 'Cookie' issue goes... there is, in fact, a TRICK that you can use ( for MSIE, anyway ) that actually works. Just defeat the HACK with another HACK. If a COS ( Content Origin Server ) sends out a "Vary: User-Agent" then most major browsers ( MSIE included ) will, in fact, cache the response locally and will 'react' to changes in "User-Agent:" field when it sends out an "If-Modified-Since:' refresh request. If you create your own psuedo-cookies and just hide them in the 'extra' text fields that are allowed to be in any "User-Agent:" field then Voila... it actually WORKS! I know that's going to send chills up Roy's spine but it happens to actually WORK OK. Nothing happens other than 'the right thing'. MSIE sees a 'different' "User-Agent:" field coming back and could actually care less WHAT the value is... it only knows that it's now 'different' and so it just goes ahead and accepts a 'fresh' response for the "Vary:". If this fellow were to simply 'stuff' his Cookie into the 'extra text' part of the User-Agent: string and send out a "Vary: User-Agent" along with the response then it would actually work the way he expects it too. Nothing else is going to solve the problem with MSIE, I'm afraid, other than this 'HACK the HACK'. Later... Kevin
Re: mod_proxy distinguish cookies?
On Mon, 3 May 2004, Neil Gunton wrote: Well, that truly sucks. If you pass options around in params then whenever someone follows a link posted by someone else, they will inherit that person's options. The only alternative might be to make pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel directive (which I haven't tried, but I assume that's what it does)... so even though my server will get hit a lot more, at least it might be stopped by the proxy rather than hitting the mod_perl. The AccelIgnoreNoCache disables a client's Pragma: no-cache, Cache-Control: no-cache and Cache-Control: max-age=number headers. The AccelIgnoreExpires disables a backend's Expires, Cache-Control: no-cache and Cache-Control: max-age=number headers. Igor Sysoev http://sysoev.ru/en/
Re: mod_proxy distinguish cookies?
[EMAIL PROTECTED] wrote: If this fellow were to simply 'stuff' his Cookie into the 'extra text' part of the User-Agent: string and send out a Vary: User-Agent along with the response then it would actually work the way he expects it too. Thanks to Roy and Kevin for your insight. Sorry if this thread is perhaps a bit off-topic for this list, but I hope you can indulge me just a little longer. When I saw Roy's response regarding the 'Vary' header, I thought that this would be exactly what I was after - you could set 'Vary: Cookie' and then the browser would see that it should reget the page if the cookie has changed. But this didn't seem to work at all in practice. I am testing with the following sequence: 1. Get a page, which has Cache-Control and Expires headers set so that it will be cached 2. Go to another page, where I use a form to change the option cookie 3. The options form sets the cookie and redirects the browser back to the original page 4. The original page is displayed, not new version - browser doesn't revalidate. I have set all the headers, this is an example: shell HEAD http://dev.crazyguyonabike.com 200 OK Cache-Control: must-revalidate; s-maxage=900; max-age=901 Connection: close Date: Wed, 05 May 2004 16:08:34 GMT Server: Apache Vary: Cookie Content-Length: 7020 Content-Type: text/html Expires: Wed, 05 May 2004 16:23:35 GMT Last-Modified: Wed, 05 May 2004 16:08:34 GMT Client-Date: Wed, 05 May 2004 16:08:35 GMT Client-Response-Num: 1 MSSmartTagsPreventParsing: TRUE So I am setting the Cache-Control to cache the page, and the client is directed to revalidate. I say in the Vary header that Cookie header must be taken into account. But the browser simply fails to revalidate the original page at all. If I manually refresh then it gets the correct version, but I can't control manual refreshes (or user options) on the browser end. I would simply love to be able to hit that sweet spot where the browser caches the page, but also sees that some magic component has changed and thus the old version of the page in the cache cannot be used any more. When I saw Kevin's response, it made perfect sense at first, because what he describes is exactly what I experienced above. Neither Mozilla 1.4 or IE 6 appear to take any notice of the 'Vary: Cookie' header. I decided to try Kevin's suggestion re the User-Agent field, but after looking at this further I am very confused. The User-Agent field is something that is passed in *from* the client, not *to* the client from a server. Why would IE or any other client even look at a User-Agent field? Ok, ok, I understand, the whole point is that this is a hack, but even so it doesn't seem to work for me. I tried setting the User-Agent field: shell HEAD http://dev.crazyguyonabike.com 200 OK Cache-Control: must-revalidate; s-maxage=900; max-age=901 Connection: close Date: Wed, 05 May 2004 16:08:34 GMT User-Agent: Mozilla/4.0 (compatible; opts=300) Server: Apache Vary: User-Agent Content-Length: 7020 Content-Type: text/html Expires: Wed, 05 May 2004 16:23:35 GMT Last-Modified: Wed, 05 May 2004 16:08:34 GMT Client-Date: Wed, 05 May 2004 16:08:35 GMT Client-Response-Num: 1 MSSmartTagsPreventParsing: TRUE As you can see, I've encoded the opts cookie into the User-Agent header. Am I doing this right? Nothing appears to change, indeed now IE doesn't even get the proper version when I hit 'Refresh'. Maybe I'm being dense and didn't read the instructions correctly, but it seemed like this was what was being suggested. Once again, I apologize if this is overly obvious or off-topic, but I have the feeling that I'm just missing something obvious here. Any insight would be much appreciated. In summary, the problem currently appears to be that neither Mozilla or IE appears to even want to revalidate the original page after the cookie has changed. When the browser is redirected back to the original page (using identical URL) from the options form, both browsers just use their cached version, without even touching the server at all. No request, nothing. When I use the 'Vary: Cookie' header, then manually refreshing does get the new version. I know that browser settings can determine how often the browser revalidates the page, but I can't tell random users on the internet to change their settings for my site. I would have thought that it should be possible for a page to be cached, and yet still be invalidated by the cookie (or, in the general case, some 'Vary' header) changing. Anyway, thanks again... -Neil
Re: mod_proxy distinguish cookies?
Hi Neil... This is Kevin Kiley... Personally, I don't think this discussion is all that OT for Apache but others might disagree. "Vary:" is still a broken mess out there and if 'getting it right' is still anyone's goal then these are the kinds of discussions that need to take place SOMEWHERE. Apache is not the W3C but it's about as close as you can get. I haven't looked at this whole thing for a LOOONG time so I had to go back and check my notes regarding the MSIE 'User-Agent' trick. As absurd as it sounds... you actually got the point. "User-Agent:' IS, in fact, supposed to be a 'request-side' header but when it comes to "Vary:"... the world can turn upside down and what doesn't seem to make any sense can actually WORK. Unfortuneately... I can't find the (old) notes I had about exactly what I did to make the "Vary: User-Agent" trick actually work with MSIE. I was just mucking around and never had any intention of implementing this as a solution for anything but I DO remember somehow making it WORK ( almost ) just the way you are doing it. If I have some time... I'll try to find those notes and the test code I know I had somewhere that WORKED. Another fellow who just responded pointed out that "Content-encoding:'" seems to be another field that MSIE will actually react to when it comes to VARY. Well... it had been so long since I mucked with all this I had to go back and find/read some notes. The fellow who posted is SORT OF right about "Content-Encoding:" LOOKING like it can "Vary:" but it's not really "Vary:" at work at all. The REALITY is explained in that link I already supplied in last message... http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html Unless there has been some major change or patch to MSIE 6.0 and above then I still stand by my original research/statement... MSIE will treat ANY field name OTHER than "User-Agent" that arrives with a "Vary:" header on a non-compressed response as if it had received "Vary: *" ( Vary: STAR ) and it will NOT CACHE that response locally. Every reference to page ( Via Refresh, Back-button, local hyperlink-jump, whatever ) will cause MSIE to go all the way upstream for a new copy of the page EVERY TIME. Maybe this is really what you want? Dunno. The reason it also LOOKS like "Content-Encoding" is being accepted as a VARY and MSIE is sending out an 'If-Modified-Since:' on those pages is NOT because it is doing "Vary:"... it's for other strange reasons. Whenever MSIE receives a compressed response ( Content-encoding: gzip ) then it will ALWAYS cache that response... even if it has been specifically told to NEVER do that ( no-cache, Expires: -1 , whatever ). It HAS to. MSIE ( and Netsape ) MUST use the CACHE FILE to DECOMPRESS the response... and it always KEEPS it around. Neither MSIE or Netscape nor Opera are able to 'decompress' in memory. They all MUST have a cache file to work from even if they are not supposed to EVER cache that particular response. They just do it anyway. So... to make a long story short... MSIE will always decide it MUST cache a response with any kind of "Content-Encoding:" on it and it will set the cache flags for that puppy to 'always-revalidate' and that's where the "If-Modified-Since:" output is coming from which makes it LOOKS like "Vary:" is involved... but it is NOT. However... in the world of "Vary:" you run into this snafu whereby you can't differentiate between what you are trying to tell an inline Proxy Cache 'what to do' versus an end-point user-aget. Example: If you are a COS ( Content Origin Server ) and you want a downstream Proxy Cache to 'Vary' the ( non-expired ) response it might give out according to whether a requestor says it can handle compression or not ( Accept-encoding: gzip, deflate ) then the right VARY header to add to the response(s) is "Vary: Accept-Encoding" and not "Vary: Content-Encoding". The "Content-Encoding" only comes FROM the Server. The 'decision' you want the Proxy Cache to make can only be based on whether a requestor has sent "Accept-Encoding: gzip, deflate" ( or not ). If there is no inline Proxy ( which is always impossible to tell ) and response is direct to browser then the same "Vary:" header that would 'do the right thing' for a Proxy Cache is meaningless for the end-point user-agent itself. The User-Agent never 'varies' it's own 'Accept-Encoding:' output header ( unless you are using Opera and clicking all those 'imitate other browser' options in-between requests for the same resource ). One of the biggest mis-conceptions out there is that browsers are somehow REQUIRED to obey all the RFC standard caching rules as if they were HTTP/x.x compliant Proxy Caches. They are NOT. The RFC's themselves say that end-point user agents can be 'implementation specific' when it comes to caching and should not be considered true "Proxy Caches". Most major browsers DO 'follow the rules' ( sort of ) but none of them could be considered true HTTP
Re: mod_proxy distinguish cookies?
[EMAIL PROTECTED] wrote: Bottom line: In order to do your 'Cookie' scheme and have it work with all major browsers you might have to give up on the idea that the responses can EVER be 'cached' locally by a browser... but now you also lose the ability to have it cached by ANYONE. There is no HTTP caching control directive that says... Cache-Control: no-cache-only-if-endpoint-user-agent Given the caching issues in most 'end-point' browsers... There probably should be such a directive. The ONLY guy you don't want to cache it is the end-point browser itself... but you DO want the response available from other nearby caches so your Content Origin Server doesn't get hammered to death. Thanks again Kevin for the insight and interesting links. It seems to me that there are basically three components here: My server, intermediate caching proxies, and the end-user browser. From my understanding of the discussion so far, each of these can be covered as follows: 1. My server: Cookies can be understood (i.e. queries are differentiated) by my server's reverse proxy cache. 2. Intermediate caching proxies: I can use the 'Vary: Cookie' header to tell any intermediate caches that cookies differentiate requests. 3. Browsers: Pass the option cookie around as part of the URL param list (relatively easy to do using HTML::Embperl or other template solution). So if the cookie is opts=123, then I make every link on my site be of the form /somedir/example.html?opts=123 This makes the page look different to the browser when the cookie is changed, so the browser will have to get the new version of the page. I don't actually use the URL param on the backend, only the cookie version of the value is used. The URL param is simply there to make the URL look different to the browser. Thus if someone posts a link to my website with opt=123 in the query string, and then someone with cookie opt=456 clicks on that link, they should successfully get the right version of the page. I think all this allows me to have pages be cached, while also allowing cookies to be used to store options. This does assume that any real proxy caches in the middle obey the Vary: Cookie header. If they get a request for a page in their cache from a browser with a different cookie to that associated with the cache entry, then presumably the cache is required to not use the cache entry and pass it through to the origin server. This obviously isn't ideal, but it attempts to address the world as it seems to be today. If anyone sees any potential problems with this sort of setup, then let me know... Thanks again, this has been a very enlightening discussion. -Neil
Re: mod_proxy distinguish cookies?
Neil wrote... Thanks again Kevin for the insight and interesting links. It seems to me that there are basically three components here: My server, intermediate caching proxies, and the end-user browser. From my understanding of the discussion so far, each of these can be covered as follows: 1. My server: Cookies can be understood (i.e. queries are differentiated) by my server's reverse proxy cache. Sure... but only if you are receiving all the requests WHEN and AS OFTEN as you need to. ( User-Agents coming back for pages when they are supposed to )... 2. Intermediate caching proxies: I can use the 'Vary: Cookie' header to tell any intermediate caches that cookies differentiate requests. Nope. Scratch the word 'any' and substitute 'some'. There are very few 'Intermediate caching proxies' that are able to 'do the right thing' when it comes to 'Vary:'. MOST Proxy Cache Servers ( including ones that SAY they are HTTP/1.1 compliant ) do NOT handle Vary: and they will simple treat ANY response they get with a "Vary:" header of any kind exactly the way MSIE seems to. They will treat it as if it was "Vary: *" ( Vary: STAR ) and will REFUSE to cache it at all. Might as well just use 'Cache-Control: no-cache'. It will be the same behavior for caches that don't support "Vary:". SQUID is the ONLY caching proxy I know of that even comes close to handling "Vary:" correctly but only the latest version(s). For years now... even SQUID would just 'punt' any response that had any kind of "Vary:" header at all. It would default all "Vary: xx" headers to "Vary: *" ( Vary: STAR ) and never bother to cache them at all. Even the latest version of SQUID is still not HTTP/1.1 compliant. There is still a lot of 'Etag:' things that don't get handled correctly. It's possible to implement "Vary:" without doing full "Etag:" support as well but there will always be times when the response is not cacheable unless full "Etag:" support is onboard. So you CAN/SHOULD use the "Vary: Cookie" response header and it WILL work for SOME inline caches... but be fully prepared for users to report problems when the inline cache is paying no attention to your "Vary:". 3. Browsers: Pass the option cookie around as part of the URL param list (relatively easy to do using HTML::Embperl or other template solution). So if the cookie is "opts=123", then I make every link on my site be of the form "/somedir/example.html?opts=123...". This makes the page look different to the browser when the cookie is changed, so the browser will have to get the new version of the page. Not sure. Maybe. I guess I really don't follow what the heck you are trying to do here. What do you mean by 'make every link on my site be of the form uri?' Don't you mean you want everyone USING your site to be sending these varius 'cookie' deals so you can tell who is who and something just steps in and makes sure they get the right response? You should not have to 'make every link on my site' be anything. Something else should be sorting all the requests out. I guess I just don't get what it is you are trying to do that falls outside the boundaries of normal CGI and 'standard practice'. AFAIK 'shopping carts' had this all figured out years ago. Now... if what you meant was that every time you send a PAGE down to someone with a particular cookie ( Real Cookie:, not URI PARMS one ) and you re-write all the clickable 'href' links in THAT DOCUMENT to have the 'other URI cookie' then yea I guess that will work. That should force any 'clicks' on that page to come back to you so that YOU can decide where they go or if that Cookie needs to change. But that would mean rewriting every page on the way out the door. Surely there must be an easier way to do whatever it is you are trying to do. Officially... the fact that you will be using QUERY PARMS at all times SHOULD take you out of the 'caching' ball game altogether since the mere presence of QUERY PARMS in a URI is SUPPOSED to make it ineligible for caching at any point in the delivery chain. In other words... might as well use 'Cache-Control: no-cache' and just force everybody to come back all the time. ...This makes the the page look different to the browser when the cookie is changed, so the browser will have to get the new version of the page. Again.. I am not sure I would say 'have to'. There is no 'have to' when it comes to what a User-Agent may or may not be doing with cached files. Most of them follow the rules but many do not. I think you might be a little confused about what is actually going on down at the browser level. Just because someone hits a 'Forward' or a 'Back' button on some GUI menu doesn't mean the HTTP freshness ( schemes ) always come into play. All you are asking the browser to do is jump between pages it has stored locally and that local cache is not actually required to be HTTP/1.1 compliant. Usually is NOT. Only the REFRESH button ( or CTRL-R ) can FORCE
Re: mod_proxy distinguish cookies?
[EMAIL PROTECTED] wrote: MOST Proxy Cache Servers ( including ones that SAY they are HTTP/1.1 compliant ) do NOT handle Vary: and they will simple treat ANY response they get with a Vary: header of any kind exactly the way MSIE seems to. They will treat it as if it was Vary: * ( Vary: STAR ) and will REFUSE to cache it at all. That's fine with me... I am mainly concerned with the browser and my server. I know the browser will cache stuff when I want it to, and so will my own reverse proxy. If intermediate caches choose not to then I don't think it will have a huge effect on my server. I guess I really don't follow what the heck you are trying to do here. What do you mean by 'make every link on my site be of the form uri?' Check out the site in question, http://www.crazyguyonabike.com/ for an example of what I'm talking about. The code on this site may change in the next couple of days, as I move over to the new way of doing things (outlined in the previous email), but it does currently have the pics=xxx on all URL's on the site. I achieve this by having global Perl routines for writing all links in all the pages. This is done in HTML::Embperl templates - every page on the site is a template. This is the way that you can pass options around the site without using cookies. The flaw is as I mentioned previously, if someone posts a link somewhere, then that link will inevitably have the poster's options embedded in the URL. So anyone who clicks on that link will get their own options overwritten with the new link. This does work just fine currently, has for a while now in fact. I guess I just don't get what it is you are trying to do that falls outside the boundaries of normal CGI and 'standard practice'. What I do currently falls well within normal CGI conventions and 'standard practice', afaik. I have also tested this with the major browsers (at least IE and Mozilla) and it works just fine, with the browser caching requests correctly according to the Cache-Control and Expires headers, and also distinguishing requests based on the URL. Perhaps this is just by coincidence and isn't the way the standards are supposed to work, but then again I think it's probable that things in the HTTP world are so entrenched at this point that if they changed the way all this works, it would just break too many sites. So it'll probably stay like this for the foreseeable future, if previous experience of inertia is anything to go by... AFAIK 'shopping carts' had this all figured out years ago. Now... if what you meant was that every time you send a PAGE down to someone with a particular cookie ( Real Cookie:, not URI PARMS one ) and you re-write all the clickable 'href' links in THAT DOCUMENT to have the 'other URI cookie' then yea I guess that will work. That should force any 'clicks' on that page to come back to you so that YOU can decide where they go or if that Cookie needs to change. But that would mean rewriting every page on the way out the door. Surely there must be an easier way to do whatever it is you are trying to do. Using template tool like HTML::Embperl, this is really not all that big a deal. Every single page on my site is a template, some with HTML and Perl code, some pure Perl modules. It may offend some purists, but I've been developing this site for over three years now and it works well for me. Officially... the fact that you will be using QUERY PARMS at all times SHOULD take you out of the 'caching' ball game altogether since the mere presence of QUERY PARMS in a URI is SUPPOSED to make it ineligible for caching at any point in the delivery chain. Is this true, or is it just something that the early proxies did because of assumptions about CGI scripts being always dynamic and therefore not cacheable? I think I read that somewhere (or maybe it was a comment about URLs with 'cgi-bin'), and anyway as I said earlier, these requests seem to be cached correctly by mod_proxy, mod_accel and the browsers, as long as the correct Expires and Cache-Control headers are present. I found that Last-Modified had to be present as well for mod_proxy to cache, I seem to recall. But anyway, it does work. In other words... might as well use 'Cache-Control: no-cache' and just force everybody to come back all the time. I don't think this is necessarily true, just from my own testing. Just because someone hits a 'Forward' or a 'Back' button on some GUI menu doesn't mean the HTTP freshness ( schemes ) always come into play. All you are asking the browser to do is jump between pages it has stored locally and that local cache is not actually required to be HTTP/1.1 compliant. Usually is NOT. Only the REFRESH button ( or CTRL-R ) can FORCE some browsers to 're-validate' a page. Simple local button navigations and re-displays from a local history list do not necessarily FORCE the browser to do anything at all 'out on the wire'. My own local Doppler Radar page is
Re: mod_proxy distinguish cookies?
Graham Leggett wrote: I would disagree - if a proxy on the net cached every variant of every page simply based on a cookie header, there would so many different variants of the same page in the cache that from a system resource perspective the cache might as well not be there. Cookies only make sense in most cases when caching has been switched off, as the cookie is usually targeted at that single user only. Your application is a unique one, in that you're trying to improve the performance of a single server on the net. This should be done within the design of that server, not by trying to change the RFC to accomodate what is a special case. Is this really such a special case? I can't believe nobody else has wanted to implement a server like this. If you want to have a setup where there is a heavy backend app server, with a lightweight reverse proxy front end, and you also want to have pages be cached, AND have personalization of pages based on cookies, then it makes perfect sense to store user options in a cookie, and then for the pages to be cached taking cookies into account. That's pretty much what cookies were made for. In this case, a cookie that set 'opts=xxx' can be seen as equivalent to having 'opts=xxx' in the request query string - but instead of the parameter having to be present in the query string, it's there in the cookie. This is much more useful, because it means that this parameter can be set once in the browser, so that this user always uses this option on this server. All pages which have the same request and same option cookie would be seen as the same page by browsers and caches. Any pages with the same request, but different option cookie are treated differently. To the caches, this is no different from passing the option in the query string. I can see that not every cookie should be seen in this way. The solution to this would perhaps be an additional property for cookies to determine how they are treated by caches and browsers. In order to not break existing behavior, the default could be what happens now - i.e. cookies are ignored as far as differentiating requests. But if there was some cookie setting that said user param or something similar, then it could be used by browsers and intermediate caches to differentiate. If a website used the query string to pass options around, then every page that had a different option would have to be cached differently anyway, so this really doesn't add any additional stress to the network. It's simply moving an option from the query string into the cookie area, so that links posted around the internet don't contain users' individual settings. It just doesn't make any sense for website user options to be stored in the URL, because it makes a nonsense out of the whole concept of setting options - anytime you happen to click on some other user's link to the same website, it wipes out any options you set yourself. Cookies are made for this sort of thing. Some cookies (random numbers, tracking cookies) don't have to be treated in this way, but I think having an additional property that makes a cookie be treated in the same way as a query string param would be very beneficial. I don't know what hope there is for getting anything like this actually implemented in the standards... but if anyone has any ideas, I'm all ears... Thanks again, -Neil
Re: mod_proxy distinguish cookies?
Neil Gunton wrote: Is this really such a special case? I can't believe nobody else has wanted to implement a server like this. It's a special case in the context of all of the servers, proxies, transparent proxies and browsers together out there on the net - it's useful to take off the load of your server, but at the cost of _increasing_ the load on transparent proxies on the net. That's not to say that making an attempt to reduce the load on your server is a bad idea or even a rare occurence (it's not), it's just that changing an RFC to do it is not the right way to achieve this. If you want to have a setup where there is a heavy backend app server, with a lightweight reverse proxy front end, and you also want to have pages be cached, AND have personalization of pages based on cookies, then it makes perfect sense to store user options in a cookie, and then for the pages to be cached taking cookies into account. There is already a mechanism for caching different variants of a page - simply encode the info into the URL. This is supported on all browsers and cannot be switched off through user preference (as cookies can). Because a mechanism already exists, there isn't much point in changing the standard to accomodate a second method to do the same thing. But you're also fighting with existing websites that use cookies to try and track individual requests, and there are a lot of them out there. If each different cookie was cached separately, then you're effectively caching separate copies of every page, which makes caching a waste of time. Regards, Graham --
Re: mod_proxy distinguish cookies?
Graham Leggett wrote: There is already a mechanism for caching different variants of a page - simply encode the info into the URL. This is supported on all browsers and cannot be switched off through user preference (as cookies can). Because a mechanism already exists, there isn't much point in changing the standard to accomodate a second method to do the same thing. As I said previously, storing user options in the URL is broken because following someone else's link to the same website erases your options. I use this currently on my website, to pass an option for size of pics (thumbnail, small or large). Every time someone posts a link to a page on my website on some message board or email, they inevitably include the whole query string, with whatever option they happen to have at that moment. So every person who clicks on the link gets their option overwritten by the pic option of the person who posted the link. I don't see how anyone could see this as being a good way to do things. But you're also fighting with existing websites that use cookies to try and track individual requests, and there are a lot of them out there. If each different cookie was cached separately, then you're effectively caching separate copies of every page, which makes caching a waste of time. I suggested expanding the cookie definition to include a type or qualifier that could be used to say whether the cookie should be treated as a param. Using cookies in this way would not put any more load on the net than at present, if the default cookie behavior was left as it is now (i.e. with additional qualifier being required in order to have the cookie taken into account). Using a special cookie or using the URL are both functionally equivalent as far as information being passed, the crucial difference being that using a different URL would not erase your options - they are being passed via cookie. To emphasize: I am not suggesting that EVERY cookie out there already be used by caches, but rather that we amend the standard so that certain cookies CAN be taken into account. This would be very useful, imho. One could make the argument that more traffic might be generated if websites started using the cookie qualifier to make ALL cookies be used by caches (thus ensuring that they would see every click by a particular user, making tracking all that much easier). However I don't think this would make any difference in reality, since websites that want this functionality can already get it by setting the pages to be no-cache. The cookie qualifier would add the benefit of being able to cache pages that have the same options set as the same cache entry. The addition of a cookie cache qualifier would not break any existing systems, because the default behavior of cookies remains unchanged. It would also not put any more load on the net than would be caused by sites passing options in the URL, since each request with a different option in the URL would have to be cached differently anyway. We gain something, and lose nothing, as far as I can tell. All the best, -Neil
Re: mod_proxy distinguish cookies?
Rather just use URL parameters. As I recall RFC2616 does not consider a request with a different cookie a different variant, so even if you patch your server to allow it to differentiate between cookies, neither the browsers nor the transparent proxies in the path of the request will do what you want them to do :( Well, that truly sucks. If you pass options around in params then whenever someone follows a link posted by someone else, they will inherit that person's options. I do wish people would read the specification to refresh their memory before summarizing. RFC 2616 doesn't say anything about cookies -- it doesn't have to because there are already several mechanisms for marking a request or response as varying. In this case Vary: Cookie added to the response by the server module (the only component capable of knowing how the resource varies) is sufficient for caching clients that are compliant with HTTP/1.1. Expires and Cache-Control are usually added as well if HTTP/1.0 caches are a problem. Roy
Re: mod_proxy distinguish cookies?
Roy T. Fielding wrote: I do wish people would read the specification to refresh their memory before summarizing. RFC 2616 doesn't say anything about cookies -- it doesn't have to because there are already several mechanisms for marking a request or response as varying. In this case Vary: Cookie added to the response by the server module (the only component capable of knowing how the resource varies) is sufficient for caching clients that are compliant with HTTP/1.1. My sentence RFC2616 does not consider a request with a different cookie a different variant should have read RFC2616 does not recognise cookies specifically at all, as they are just another header. I did not think of the Vary case, sorry for the confusion. Regards, Graham --
Re: mod_proxy distinguish cookies?
Graham Leggett wrote: Neil Gunton wrote: The problem now is that the browsers (IE and Mozilla at least) don't seem to differentiate requests based on cookies. I have tested requesting a page with a certain cookie (where the page has a sufficient expiration to warrant being cached for the duration of the test), and then changing the cookie, and re-requesting the same page as before. The cookie is different, but the browsers still seem to use their local cached copy of the page. So, I am currently thinking that the solution to this is to use a combination of cookies and URL parameters to make the requests look different. Rather just use URL parameters. As I recall RFC2616 does not consider a request with a different cookie a different variant, so even if you patch your server to allow it to differentiate between cookies, neither the browsers nor the transparent proxies in the path of the request will do what you want them to do :( Well, that truly sucks. If you pass options around in params then whenever someone follows a link posted by someone else, they will inherit that person's options. The only alternative might be to make pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel directive (which I haven't tried, but I assume that's what it does)... so even though my server will get hit a lot more, at least it might be stopped by the proxy rather than hitting the mod_perl. From what you are saying, it would appear that HTTP is broken with regard to cookies and caching. I thought they had all that sorted out a while back. Never mind... Thanks for the insight, I'll have to think about this some more it seems. Either have extremely volatile options via URL params with page caching, or no caching (outside of my server, which would mean a LOT more traffic since every time someone hits 'Back' on their browser it would think it had to re-get the page) and persistent options. Hmmm... Any other ideas would be welcomed, but right now that's about all I can think of... Thanks again, -Neil
Re: mod_proxy distinguish cookies?
Neil Gunton wrote: The problem now is that the browsers (IE and Mozilla at least) don't seem to differentiate requests based on cookies. I have tested requesting a page with a certain cookie (where the page has a sufficient expiration to warrant being cached for the duration of the test), and then changing the cookie, and re-requesting the same page as before. The cookie is different, but the browsers still seem to use their local cached copy of the page. So, I am currently thinking that the solution to this is to use a combination of cookies and URL parameters to make the requests look different. Rather just use URL parameters. As I recall RFC2616 does not consider a request with a different cookie a different variant, so even if you patch your server to allow it to differentiate between cookies, neither the browsers nor the transparent proxies in the path of the request will do what you want them to do :( Regards, Graham --
Re: mod_proxy distinguish cookies?
Igor Sysoev wrote: mod_accel ( http://sysoev.ru/en/ ) allows to take cookies into account while caching: AccelCacheCookie some_cookie_name another_cookie_name You can set it on per-location basis. Besides, my upcoming light-weight http and reverse proxy server nginx will allow to do it too. Double check first whether this is allowed by RFC2616 - remember that the Apache mod_proxy is very unlikely to be the only proxy in the chain, so even if mod_proxy takes cookies into account, other caches in the chain might not. Also, it is unlikely that changes will be made to the v1.3 tree code, it is more likely that such a feature might be found in the mod_cache modules of Apache v2.0. The original design of mod_cache allowed for the caching of different variants of the same URL (different variants might be different languages of the same page, etc), though I am not sure if that feature currently works. If it does, that would be what you need. Regards, Graham --
Re: mod_proxy distinguish cookies?
On Sat, 24 Apr 2004, Neil Gunton wrote: Neil Gunton wrote: Hi all, I apologise in advance if this is obvious or otherwise been answered elsewhere, but I can't seem to find any reference to it. I am using Apache 1.3.29 with mod_perl, on Linux 2.4. I am running mod_proxy as a caching reverse proxy front end, and mod_perl on the backend. This works really well, but I have noticed that mod_proxy does not seem to be able to distinguish requests as being different if the URLs are the same, but they contain different cookies. I would like to be able to enable more personalization on my site, which would best be done using cookies. The problem is that when a page has an expiration greater than 'now', then any request to the same URL will get the cache version, even if the requests have different cookies. Currently I have to pass options around as part of the URL in order to make the requests look different to mod_proxy. Am I missing something here? Or, will this be included in either future versions of mod_proxy or the equivalent module in Apache 2.x? Any insights greatly appreciated. I should perhaps make clear that I do have cookies working through the proxy just fine, for pages that are set to be 'no-cache'. So this isn't an issue with the proxy being able to pass cookies to/from the backend and browser (which I think I have seen mentioned before as a bugfix), but rather with mod_proxy simply being able to distinguish otherwise identical URL requests that have different cookies, and cache those as different requests. So for example, the request GET /somedir/somepage.html?xxx=yyy passed with a cookie that value 'pics=small' should be seen as different from another identical request, but with cookie value 'pics=large'. Currently my tests indicate that mod_proxy returns the same cached page for each request. I assume that mod_proxy only checks the actual request string, and not the HTTP header which contains the cookie. Obviously, under this scheme, if you were using cookies to track sessions then all requests would get passed to the backend server - so, perhaps it would be a nice additional feature to be able to configure, through httpd.conf, how mod_proxy (or its successor) pays attention to cookies. For example, you might say something to the effect of ignore this cookie or differentiate requests using this cookie. Then we could have sitewide options like e.g. 'pics' (to set what size pictures are shown), and this could be used to distinguish cached pages, but other cookies might be ignored on some pages. This would allow for more flexibility, with some cached pages being sensitive to cookies, while others are not. An obvious way this would be useful is in the use of login cookies. These will be passed in by the browser for every page on the site, but this doesn't mean we want to distinguish cached pages based on it for every page. Some user-specific pages would have 'no-cache' set, while other pages could be set to ignore this login cookie, thus gaining the benefits of the proxy caching. This would be useful for pages that have no user-specific or personalizable aspects - they could be cached regardless of who is logged in. Sorry if this wasn't clear from the original post, just wanted to clarify and expand... any advice on this would be VERY welcomed, since my options with personalization are currently rather limited. Also, if this is actually addressed to the wrong list for some reason then a pointer would be much appreciated... mod_accel ( http://sysoev.ru/en/ ) allows to take cookies into account while caching: AccelCacheCookie some_cookie_name another_cookie_name You can set it on per-location basis. Besides, my upcoming light-weight http and reverse proxy server nginx will allow to do it too. Igor Sysoev http://sysoev.ru/en/
Re: mod_proxy distinguish cookies?
Neil Gunton wrote: Hi all, I apologise in advance if this is obvious or otherwise been answered elsewhere, but I can't seem to find any reference to it. I am using Apache 1.3.29 with mod_perl, on Linux 2.4. I am running mod_proxy as a caching reverse proxy front end, and mod_perl on the backend. This works really well, but I have noticed that mod_proxy does not seem to be able to distinguish requests as being different if the URLs are the same, but they contain different cookies. I would like to be able to enable more personalization on my site, which would best be done using cookies. The problem is that when a page has an expiration greater than 'now', then any request to the same URL will get the cache version, even if the requests have different cookies. Currently I have to pass options around as part of the URL in order to make the requests look different to mod_proxy. Am I missing something here? Or, will this be included in either future versions of mod_proxy or the equivalent module in Apache 2.x? Any insights greatly appreciated. I should perhaps make clear that I do have cookies working through the proxy just fine, for pages that are set to be 'no-cache'. So this isn't an issue with the proxy being able to pass cookies to/from the backend and browser (which I think I have seen mentioned before as a bugfix), but rather with mod_proxy simply being able to distinguish otherwise identical URL requests that have different cookies, and cache those as different requests. So for example, the request GET /somedir/somepage.html?xxx=yyy passed with a cookie that value 'pics=small' should be seen as different from another identical request, but with cookie value 'pics=large'. Currently my tests indicate that mod_proxy returns the same cached page for each request. I assume that mod_proxy only checks the actual request string, and not the HTTP header which contains the cookie. Obviously, under this scheme, if you were using cookies to track sessions then all requests would get passed to the backend server - so, perhaps it would be a nice additional feature to be able to configure, through httpd.conf, how mod_proxy (or its successor) pays attention to cookies. For example, you might say something to the effect of ignore this cookie or differentiate requests using this cookie. Then we could have sitewide options like e.g. 'pics' (to set what size pictures are shown), and this could be used to distinguish cached pages, but other cookies might be ignored on some pages. This would allow for more flexibility, with some cached pages being sensitive to cookies, while others are not. An obvious way this would be useful is in the use of login cookies. These will be passed in by the browser for every page on the site, but this doesn't mean we want to distinguish cached pages based on it for every page. Some user-specific pages would have 'no-cache' set, while other pages could be set to ignore this login cookie, thus gaining the benefits of the proxy caching. This would be useful for pages that have no user-specific or personalizable aspects - they could be cached regardless of who is logged in. Sorry if this wasn't clear from the original post, just wanted to clarify and expand... any advice on this would be VERY welcomed, since my options with personalization are currently rather limited. Also, if this is actually addressed to the wrong list for some reason then a pointer would be much appreciated... Thanks again, -Neil