Re: mod_proxy distinguish cookies?

2004-05-06 Thread Julian Reschke
FYI: I recently had a long exchange with Microsoft's support regarding 
the Vary header, and the outcome was that they have at least 
*documented* their RFC2616 compliance issue:

http://support.microsoft.com/default.aspx?scid=kb;en-us;824847

Best regards, Julian

--
green/bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


Re: mod_proxy distinguish cookies?

2004-05-05 Thread TOKILEY

 Roy T. Fielding wrote:

 I do wish people would read the specification to refresh their memory
 before summarizing. RFC 2616 doesn't say anything about cookies -- it
 doesn't have to because there are already several mechanisms for marking
 a request or response as varying. In this case
 
 Vary: Cookie
 
 added to the response by the server module (the only component capable
 of knowing how the resource varies) is sufficient for caching clients
 that are compliant with HTTP/1.1.

 Graham wrote...

 My sentence "RFC2616 does not consider a request with a different cookie 
 a different variant" should have read "RFC2616 does not recognise 
 cookies specifically at all, as they are just another header". I did not 
 think of the Vary case, sorry for the confusion.
 
 Regards,
 Graham

"Vary" still won't work for the original caller's scenario.

Few people know this but Microsoft Internet Explorer and other
major browsers only PRETEND to support "Vary:".

In MSIE's case... there is only 1 value that you can use with
"Vary:" that will cause MSIE to make any attempt at all to
cache the response and/or deal with a refresh later.

That value is "User-Agent".

MSIE treats all other "Vary:" header values as if it
received "Vary: *" and will REFUSE to cache that
response at all.

This means that if you try and use "Vary:" for anything
other than "User-Agent" then the browser is going to
not cache anything (ever) and will be hammering away at
the unlucky nearest target ProxyCache and/or Content Server.

Why in the world an end-point User-Agent would only be
interested in doing a "Vary:" on its own name ( which it
already knows ) ceases to be a mystery if you read the
following link. The HACK that Microsoft added actually
originated as a problem report to the Apache Group itself
back in 1999...

URI title: Client bug: IE 4.0 breaks with "Vary" header.

http://bugs.apache.org/index.cgi/full/4118

Microsfot reacted to the problem with a simple HACK that
just looks for "User-Agent" and this fixed 4.0.

That simple hack is the only "Vary:" support MSIE really
has to this day.

The following message thread at W3C.ORG itself proves
that the "Vary:" problem still exists with MSIE 6.0 ( and other
major browsers )...

http://lists.w3.org/Archives/Public/ietf-http-wg/2002AprJun/0046.html

There is also a lengthy discussion about why "Vary:" is a 
nightmare on the client side at the mod_gzip forum.
The discussion centers on the fact that major browsers will
refuse to cache responses locally that have 
"Vary: Accept-encoding" and will end up hammering 
Content Servers but the discussion expanded when it
was discovered that most browsers won't do "Vary:" at all.

http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html

As far as this fellow's 'Cookie' issue goes... there is, in fact, a 
TRICK that you can use ( for MSIE, anyway ) that 
actually works.

Just defeat the HACK with another HACK.

If a COS ( Content Origin Server ) sends out a 
"Vary: User-Agent" then most major browsers
( MSIE included ) will, in fact, cache the response
locally and will 'react' to changes in "User-Agent:"
field when it sends out an "If-Modified-Since:' 
refresh request.

If you create your own psuedo-cookies and just hide
them in the 'extra' text fields that are allowed to be
in any "User-Agent:" field then Voila... it actually WORKS!

I know that's going to send chills up Roy's spine but
it happens to actually WORK OK.

Nothing happens other than 'the right thing'.

MSIE sees a 'different' "User-Agent:" field coming
back and could actually care less WHAT the 
value is... it only knows that it's now 'different' and
so it just goes ahead and accepts a 
'fresh' response for the "Vary:".

If this fellow were to simply 'stuff' his Cookie into the
'extra text' part of the User-Agent: string and send
out a "Vary: User-Agent" along with the response
then it would actually work the way he expects it too.

Nothing else is going to solve the problem with MSIE,
I'm afraid, other than this 'HACK the HACK'.

Later...
Kevin



Re: mod_proxy distinguish cookies?

2004-05-05 Thread Igor Sysoev
On Mon, 3 May 2004, Neil Gunton wrote:

 Well, that truly sucks. If you pass options around in params then
 whenever someone follows a link posted by someone else, they will
 inherit that person's options. The only alternative might be to make
 pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel
 directive (which I haven't tried, but I assume that's what it does)...
 so even though my server will get hit a lot more, at least it might be
 stopped by the proxy rather than hitting the mod_perl.

The AccelIgnoreNoCache disables a client's Pragma: no-cache,
Cache-Control: no-cache and Cache-Control: max-age=number headers.

The AccelIgnoreExpires disables a backend's Expires,
Cache-Control: no-cache and Cache-Control: max-age=number headers.


Igor Sysoev
http://sysoev.ru/en/


Re: mod_proxy distinguish cookies?

2004-05-05 Thread Neil Gunton
[EMAIL PROTECTED] wrote:
 If this fellow were to simply 'stuff' his Cookie into the
 'extra text' part of the User-Agent: string and send
 out a Vary: User-Agent along with the response
 then it would actually work the way he expects it too.

Thanks to Roy and Kevin for your insight. Sorry if this thread is
perhaps a bit off-topic for this list, but I hope you can indulge me
just a little longer. When I saw Roy's response regarding the 'Vary'
header, I thought that this would be exactly what I was after - you
could set 'Vary: Cookie' and then the browser would see that it should
reget the page if the cookie has changed. But this didn't seem to work
at all in practice. I am testing with the following sequence:

1. Get a page, which has Cache-Control and Expires headers set so that
it will be cached
2. Go to another page, where I use a form to change the option cookie
3. The options form sets the cookie and redirects the browser back to
the original page
4. The original page is displayed, not new version - browser doesn't
revalidate.

I have set all the headers, this is an example:

shell HEAD http://dev.crazyguyonabike.com
200 OK
Cache-Control: must-revalidate; s-maxage=900; max-age=901
Connection: close
Date: Wed, 05 May 2004 16:08:34 GMT
Server: Apache
Vary: Cookie
Content-Length: 7020
Content-Type: text/html
Expires: Wed, 05 May 2004 16:23:35 GMT
Last-Modified: Wed, 05 May 2004 16:08:34 GMT
Client-Date: Wed, 05 May 2004 16:08:35 GMT
Client-Response-Num: 1
MSSmartTagsPreventParsing: TRUE

So I am setting the Cache-Control to cache the page, and the client is
directed to revalidate. I say in the Vary header that Cookie header must
be taken into account. But the browser simply fails to revalidate the
original page at all. If I manually refresh then it gets the correct
version, but I can't control manual refreshes (or user options) on the
browser end. I would simply love to be able to hit that sweet spot
where the browser caches the page, but also sees that some magic
component has changed and thus the old version of the page in the cache
cannot be used any more.

When I saw Kevin's response, it made perfect sense at first, because
what he describes is exactly what I experienced above. Neither Mozilla
1.4 or IE 6 appear to take any notice of the 'Vary: Cookie' header. I
decided to try Kevin's suggestion re the User-Agent field, but after
looking at this further I am very confused. The User-Agent field is
something that is passed in *from* the client, not *to* the client from
a server. Why would IE or any other client even look at a User-Agent
field? Ok, ok, I understand, the whole point is that this is a hack,
but even so it doesn't seem to work for me. I tried setting the
User-Agent field:

shell HEAD http://dev.crazyguyonabike.com
200 OK
Cache-Control: must-revalidate; s-maxage=900; max-age=901
Connection: close
Date: Wed, 05 May 2004 16:08:34 GMT
User-Agent: Mozilla/4.0 (compatible; opts=300)
Server: Apache
Vary: User-Agent
Content-Length: 7020
Content-Type: text/html
Expires: Wed, 05 May 2004 16:23:35 GMT
Last-Modified: Wed, 05 May 2004 16:08:34 GMT
Client-Date: Wed, 05 May 2004 16:08:35 GMT
Client-Response-Num: 1
MSSmartTagsPreventParsing: TRUE

As you can see, I've encoded the opts cookie into the User-Agent header.
Am I doing this right? Nothing appears to change, indeed now IE doesn't
even get the proper version when I hit 'Refresh'. Maybe I'm being dense
and didn't read the instructions correctly, but it seemed like this was
what was being suggested.

Once again, I apologize if this is overly obvious or off-topic, but I
have the feeling that I'm just missing something obvious here. Any
insight would be much appreciated. In summary, the problem currently
appears to be that neither Mozilla or IE appears to even want to
revalidate the original page after the cookie has changed. When the
browser is redirected back to the original page (using identical URL)
from the options form, both browsers just use their cached version,
without even touching the server at all. No request, nothing. When I use
the 'Vary: Cookie' header, then manually refreshing does get the new
version. I know that browser settings can determine how often the
browser revalidates the page, but I can't tell random users on the
internet to change their settings for my site. I would have thought that
it should be possible for a page to be cached, and yet still be
invalidated by the cookie (or, in the general case, some 'Vary' header)
changing.

Anyway, thanks again...

-Neil


Re: mod_proxy distinguish cookies?

2004-05-05 Thread TOKILEY

Hi Neil...
This is Kevin Kiley...

Personally, I don't think this discussion is all that OT for
Apache but others might disagree.

"Vary:" is still a broken mess out there and if 'getting it right'
is still anyone's goal then these are the kinds of discussions
that need to take place SOMEWHERE. Apache is not the
W3C but it's about as close as you can get.

I haven't looked at this whole thing for a LOOONG time so
I had to go back and check my notes regarding the 
MSIE 'User-Agent' trick.

As absurd as it sounds... you actually got the point.

"User-Agent:' IS, in fact, supposed to be a 'request-side'
header but when it comes to "Vary:"... the world can
turn upside down and what doesn't seem to make any
sense can actually WORK.

Unfortuneately... I can't find the (old) notes I had about
exactly what I did to make the "Vary: User-Agent" trick
actually work with MSIE. I was just mucking around and
never had any intention of implementing this as a solution
for anything but I DO remember somehow making it WORK
( almost ) just the way you are doing it.

If I have some time... I'll try to find those notes and the
test code I know I had somewhere that WORKED.

Another fellow who just responded pointed out that
"Content-encoding:'" seems to be another field that
MSIE will actually react to when it comes to VARY.

Well... it had been so long since I mucked with all
this I had to go back and find/read some notes.

The fellow who posted is SORT OF right about
"Content-Encoding:" LOOKING like it can "Vary:"
but it's not really "Vary:" at work at all.

The REALITY is explained in that link I already
supplied in last message...

http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html

Unless there has been some major change or patch to MSIE 6.0
and above then I still stand by my original research/statement...

MSIE will treat ANY field name OTHER than "User-Agent"
that arrives with a "Vary:" header on a non-compressed
response as if it had received
"Vary: *" ( Vary: STAR ) and it will NOT CACHE that response
locally. Every reference to page ( Via Refresh, Back-button, 
local hyperlink-jump, whatever ) will cause MSIE to go all
the way upstream for a new copy of the page EVERY TIME.

Maybe this is really what you want? Dunno.

The reason it also LOOKS like "Content-Encoding" is 
being accepted as a VARY and MSIE is sending out
an 'If-Modified-Since:' on those pages is NOT because
it is doing "Vary:"... it's for other strange reasons.

Whenever MSIE receives a compressed response
( Content-encoding: gzip ) then it will ALWAYS
cache that response... even if it has been specifically
told to NEVER do that ( no-cache, Expires: -1 , whatever ).

It HAS to. MSIE ( and Netsape ) MUST use the CACHE FILE
to DECOMPRESS the response... and it always KEEPS
it around.

Neither MSIE or Netscape nor Opera are able to 'decompress'
in memory. They all MUST have a cache file to work from
even if they are not supposed to EVER cache that 
particular response. They just do it anyway.

So... to make a long story short... MSIE will always 
decide it MUST cache a response with any kind of
"Content-Encoding:" on it and it will set the cache 
flags for that puppy to 'always-revalidate' and that's
where the "If-Modified-Since:" output is coming from
which makes it LOOKS like "Vary:" is involved...
but it is NOT.

However... in the world of "Vary:" you run into this snafu
whereby you can't differentiate between what you are
trying to tell an inline Proxy Cache 'what to do' versus
an end-point user-aget.

Example: If you are a COS ( Content Origin Server ) and
you want a downstream Proxy Cache to 'Vary' the 
( non-expired ) response it might give out according to
whether a requestor says it can handle compression
or not ( Accept-encoding: gzip, deflate ) then the right
VARY header to add to the response(s) is

"Vary: Accept-Encoding"

and not 

"Vary: Content-Encoding".

The "Content-Encoding" only comes FROM the Server.
The 'decision' you want the Proxy Cache to make can
only be based on whether a requestor has sent
"Accept-Encoding: gzip, deflate" ( or not ).

If there is no inline Proxy ( which is always impossible to tell )
and response is direct to browser then the same "Vary:"
header that would 'do the right thing' for a Proxy Cache
is meaningless for the end-point user-agent itself.

The User-Agent never 'varies' it's own 'Accept-Encoding:'
output header ( unless you are using Opera and clicking
all those 'imitate other browser' options in-between requests
for the same resource ).

One of the biggest mis-conceptions out there is that browsers
are somehow REQUIRED to obey all the RFC standard 
caching rules as if they were HTTP/x.x compliant Proxy
Caches.

They are NOT. The RFC's themselves say that end-point
user agents can be 'implementation specific' when it comes
to caching and should not be considered true "Proxy Caches".

Most major browsers DO 'follow the rules' ( sort of ) but 
none of them could be considered true HTTP 

Re: mod_proxy distinguish cookies?

2004-05-05 Thread Neil Gunton
[EMAIL PROTECTED] wrote:
 Bottom line:
 
 In order to do your 'Cookie' scheme and have it work with
 all major browsers you might have to give up on the idea
 that the responses can EVER be 'cached' locally by
 a browser... but now you also lose the ability to have
 it cached by ANYONE.
 
 There is no HTTP caching control directive that says...
 
 Cache-Control: no-cache-only-if-endpoint-user-agent
 
 Given the caching issues in most 'end-point' browsers...
 There probably should be such a directive.
 
 The ONLY guy you don't want to cache it is the
 end-point browser itself... but you DO want the
 response available from other nearby caches so
 your Content Origin Server doesn't get hammered
 to death.

Thanks again Kevin for the insight and interesting links. It seems to me
that there are basically three components here: My server, intermediate
caching proxies, and the end-user browser. From my understanding of the
discussion so far, each of these can be covered as follows:

1. My server: Cookies can be understood (i.e. queries are
differentiated) by my server's reverse proxy cache.

2. Intermediate caching proxies: I can use the 'Vary: Cookie' header to
tell any intermediate caches that cookies differentiate requests.

3. Browsers: Pass the option cookie around as part of the URL param list
(relatively easy to do using HTML::Embperl or other template solution).
So if the cookie is opts=123, then I make every link on my site be of
the form /somedir/example.html?opts=123 This makes the page look
different to the browser when the cookie is changed, so the browser will
have to get the new version of the page. I don't actually use the URL
param on the backend, only the cookie version of the value is used. The
URL param is simply there to make the URL look different to the browser.
Thus if someone posts a link to my website with opt=123 in the query
string, and then someone with cookie opt=456 clicks on that link, they
should successfully get the right version of the page.

I think all this allows me to have pages be cached, while also allowing
cookies to be used to store options. This does assume that any real
proxy caches in the middle obey the Vary: Cookie header. If they get a
request for a page in their cache from a browser with a different cookie
to that associated with the cache entry, then presumably the cache is
required to not use the cache entry and pass it through to the origin
server.

This obviously isn't ideal, but it attempts to address the world as it
seems to be today.

If anyone sees any potential problems with this sort of setup, then let
me know...

Thanks again, this has been a very enlightening discussion.

-Neil


Re: mod_proxy distinguish cookies?

2004-05-05 Thread TOKILEY

 Neil wrote...

 Thanks again Kevin for the insight and interesting links. It seems to me
 that there are basically three components here: My server, intermediate
 caching proxies, and the end-user browser. From my understanding of the
 discussion so far, each of these can be covered as follows:

 1. My server: Cookies can be understood (i.e. queries are
 differentiated) by my server's reverse proxy cache.

Sure... but only if you are receiving all the requests WHEN
and AS OFTEN as you need to. ( User-Agents coming back
for pages when they are supposed to )...

 2. Intermediate caching proxies: I can use the 'Vary: Cookie' header to
 tell any intermediate caches that cookies differentiate requests.

Nope. Scratch the word 'any' and substitute 'some'.

There are very few 'Intermediate caching proxies' that are able to
'do the right thing' when it comes to 'Vary:'.

MOST Proxy Cache Servers ( including ones that SAY they are
HTTP/1.1 compliant ) do NOT handle Vary: and they will simple
treat ANY response they get with a "Vary:" header of any kind
exactly the way MSIE seems to. They will treat it as if it was
"Vary: *" ( Vary: STAR ) and will REFUSE to cache it at all.

Might as well just use 'Cache-Control: no-cache'. It will be the
same behavior for caches that don't support "Vary:".

SQUID is the ONLY caching proxy I know of that even comes
close to handling "Vary:" correctly but only the latest version(s).

For years now... even SQUID would just 'punt' any response
that had any kind of "Vary:" header at all. It would default
all "Vary: xx" headers to "Vary: *" ( Vary: STAR ) and
never bother to cache them at all.

Even the latest version
of SQUID is still not HTTP/1.1 compliant. There is still a lot
of 'Etag:' things that don't get handled correctly.

It's possible to implement "Vary:" without doing full "Etag:"
support as well but there will always be times when the 
response is not cacheable unless full "Etag:" support
is onboard.

So you CAN/SHOULD use the "Vary: Cookie" response
header and it WILL work for SOME inline caches... but
be fully prepared for users to report problems when the
inline cache is paying no attention to your "Vary:".

 3. Browsers: Pass the option cookie around as part of the URL param list
 (relatively easy to do using HTML::Embperl or other template solution).
 So if the cookie is "opts=123", then I make every link on my site be of
 the form "/somedir/example.html?opts=123...". This makes the page look
 different to the browser when the cookie is changed, so the browser will
 have to get the new version of the page. 

Not sure. Maybe.

I guess I really don't follow what the heck you are trying to do here.

What do you mean by 'make every link on my site be of the form uri?'

Don't you mean you want everyone USING your site to be sending
these varius 'cookie' deals so you can tell who is who and something
just steps in and makes sure they get the right response?

You should not have to 'make every link on my site' be anything.
Something else should be sorting all the requests out.

I guess I just don't get what it is you are trying to do that falls
outside the boundaries of normal CGI and 'standard practice'.

AFAIK 'shopping carts' had this all figured out years ago.

Now... if what you meant was that every time you send a PAGE
down to someone with a particular cookie ( Real Cookie:, not
URI PARMS one ) and you re-write all the clickable 'href' links
in THAT DOCUMENT to have the 'other URI cookie' then yea
I guess that will work. That should force any 'clicks' on that
page to come back to you so that YOU can decide where
they go or if that Cookie needs to change.

But that would mean rewriting every page on the way out the door.

Surely there must be an easier way to do whatever it is you
are trying to do.

Officially... the fact that you will be using QUERY PARMS at
all times SHOULD take you out of the 'caching' ball game
altogether since the mere presence of QUERY PARMS in
a URI is SUPPOSED to make it ineligible for caching at
any point in the delivery chain.

In other words... might as well use 'Cache-Control: no-cache'
and just force everybody to come back all the time.

 ...This makes the the page look
 different to the browser when the cookie is changed, so the browser will
 have to get the new version of the page. 

Again.. I am not sure I would say 'have to'.

There is no 'have to' when it comes to what a User-Agent may or
may not be doing with cached files. Most of them follow the rules
but many do not.

I think you might be a little confused about what is actually going
on down at the browser level.

Just because someone hits a 'Forward' or a 'Back' button on some
GUI menu doesn't mean the HTTP freshness ( schemes ) always
come into play. All you are asking the browser to do is jump 
between pages it has stored locally and that local cache is
not actually required to be HTTP/1.1 compliant. Usually is NOT.

Only the REFRESH button ( or CTRL-R ) can FORCE 

Re: mod_proxy distinguish cookies?

2004-05-05 Thread Neil Gunton
[EMAIL PROTECTED] wrote:
 MOST Proxy Cache Servers ( including ones that SAY they are
 HTTP/1.1 compliant ) do NOT handle Vary: and they will simple
 treat ANY response they get with a Vary: header of any kind
 exactly the way MSIE seems to. They will treat it as if it was
 Vary: *  ( Vary: STAR ) and will REFUSE to cache it at all.

That's fine with me... I am mainly concerned with the browser and my
server. I know the browser will cache stuff when I want it to, and so
will my own reverse proxy. If intermediate caches choose not to then I
don't think it will have a huge effect on my server.

 I guess I really don't follow what the heck you are trying to do here.
 
 What do you mean by 'make every link on my site be of the form
 uri?'

Check out the site in question, http://www.crazyguyonabike.com/ for an
example of what I'm talking about. The code on this site may change in
the next couple of days, as I move over to the new way of doing things
(outlined in the previous email), but it does currently have the
pics=xxx on all URL's on the site. I achieve this by having global
Perl routines for writing all links in all the pages. This is done in
HTML::Embperl templates - every page on the site is a template. This is
the way that you can pass options around the site without using cookies.
The flaw is as I mentioned previously, if someone posts a link
somewhere, then that link will inevitably have the poster's options
embedded in the URL. So anyone who clicks on that link will get their
own options overwritten with the new link. This does work just fine
currently, has for a while now in fact.

 I guess I just don't get what it is you are trying to do that falls
 outside the boundaries of normal CGI and 'standard practice'.

What I do currently falls well within normal CGI conventions and
'standard practice', afaik. I have also tested this with the major
browsers (at least IE and Mozilla) and it works just fine, with the
browser caching requests correctly according to the Cache-Control and
Expires headers, and also distinguishing requests based on the URL.
Perhaps this is just by coincidence and isn't the way the standards are
supposed to work, but then again I think it's probable that things in
the HTTP world are so entrenched at this point that if they changed the
way all this works, it would just break too many sites. So it'll
probably stay like this for the foreseeable future, if previous
experience of inertia is anything to go by...

 AFAIK 'shopping carts' had this all figured out years ago.
 
 Now... if what you meant was that every time you send a PAGE
 down to someone with a particular cookie ( Real Cookie:, not
 URI PARMS one ) and you re-write all the clickable 'href' links
 in THAT DOCUMENT to have the 'other URI cookie' then yea
 I guess that will work. That should force any 'clicks' on that
 page to come back to you so that YOU can decide where
 they go or if that Cookie needs to change.
 
 But that would mean rewriting every page on the way out the door.
 
 Surely there must be an easier way to do whatever it is you
 are trying to do.

Using template tool like HTML::Embperl, this is really not all that big
a deal. Every single page on my site is a template, some with HTML and
Perl code, some pure Perl modules. It may offend some purists, but I've
been developing this site for over three years now and it works well for
me.

 Officially... the fact that you will be using QUERY PARMS at
 all times SHOULD take you out of the 'caching' ball game
 altogether since the mere presence of QUERY PARMS in
 a URI is SUPPOSED to make it ineligible for caching at
 any point in the delivery chain.

Is this true, or is it just something that the early proxies did because
of assumptions about CGI scripts being always dynamic and therefore not
cacheable? I think I read that somewhere (or maybe it was a comment
about URLs with 'cgi-bin'), and anyway as I said earlier, these requests
seem to be cached correctly by mod_proxy, mod_accel and the browsers, as
long as the correct Expires and Cache-Control headers are present. I
found that Last-Modified had to be present as well for mod_proxy to
cache, I seem to recall. But anyway, it does work.

 In other words... might as well use 'Cache-Control: no-cache'
 and just force everybody to come back all the time.

I don't think this is necessarily true, just from my own testing.

 Just because someone hits a 'Forward' or a 'Back' button on some
 GUI menu doesn't mean the HTTP freshness ( schemes ) always
 come into play. All you are asking the browser to do is jump
 between pages it has stored locally and that local cache is
 not actually required to be HTTP/1.1 compliant. Usually is NOT.
 
 Only the REFRESH button ( or CTRL-R ) can FORCE some browsers
 to 're-validate' a page. Simple local button navigations and
 re-displays
 from a local history list do not necessarily FORCE the browser to
 do anything at all 'out on the wire'.
 
 My own local Doppler Radar page is 

Re: mod_proxy distinguish cookies?

2004-05-04 Thread Neil Gunton
Graham Leggett wrote:
 I would disagree - if a proxy on the net cached every variant of every
 page simply based on a cookie header, there would so many different
 variants of the same page in the cache that from a system resource
 perspective the cache might as well not be there. Cookies only make
 sense in most cases when caching has been switched off, as the cookie is
 usually targeted at that single user only.
 
 Your application is a unique one, in that you're trying to improve the
 performance of a single server on the net. This should be done within
 the design of that server, not by trying to change the RFC to accomodate
 what is a special case.

Is this really such a special case? I can't believe nobody else has
wanted to implement a server like this. If you want to have a setup
where there is a heavy backend app server, with a lightweight reverse
proxy front end, and you also want to have pages be cached, AND have
personalization of pages based on cookies, then it makes perfect sense
to store user options in a cookie, and then for the pages to be cached
taking cookies into account. That's pretty much what cookies were made
for. In this case, a cookie that set 'opts=xxx' can be seen as
equivalent to having 'opts=xxx' in the request query string - but
instead of the parameter having to be present in the query string, it's
there in the cookie. This is much more useful, because it means that
this parameter can be set once in the browser, so that this user always
uses this option on this server. All pages which have the same request
and same option cookie would be seen as the same page by browsers and
caches. Any pages with the same request, but different option cookie are
treated differently. To the caches, this is no different from passing
the option in the query string.

I can see that not every cookie should be seen in this way. The solution
to this would perhaps be an additional property for cookies to determine
how they are treated by caches and browsers. In order to not break
existing behavior, the default could be what happens now - i.e. cookies
are ignored as far as differentiating requests. But if there was some
cookie setting that said user param or something similar, then it
could be used by browsers and intermediate caches to differentiate.

If a website used the query string to pass options around, then every
page that had a different option would have to be cached differently
anyway, so this really doesn't add any additional stress to the network.
It's simply moving an option from the query string into the cookie area,
so that links posted around the internet don't contain users' individual
settings. It just doesn't make any sense for website user options to be
stored in the URL, because it makes a nonsense out of the whole concept
of setting options - anytime you happen to click on some other user's
link to the same website, it wipes out any options you set yourself.
Cookies are made for this sort of thing. Some cookies (random numbers,
tracking cookies) don't have to be treated in this way, but I think
having an additional property that makes a cookie be treated in the same
way as a query string param would be very beneficial.

I don't know what hope there is for getting anything like this actually
implemented in the standards... but if anyone has any ideas, I'm all
ears...

Thanks again,

-Neil


Re: mod_proxy distinguish cookies?

2004-05-04 Thread Graham Leggett
Neil Gunton wrote:

Is this really such a special case? I can't believe nobody else has
wanted to implement a server like this.
It's a special case in the context of all of the servers, proxies, 
transparent proxies and browsers together out there on the net - it's 
useful to take off the load of your server, but at the cost of 
_increasing_ the load on transparent proxies on the net.

That's not to say that making an attempt to reduce the load on your 
server is a bad idea or even a rare occurence (it's not), it's just that 
changing an RFC to do it is not the right way to achieve this.

 If you want to have a setup
where there is a heavy backend app server, with a lightweight reverse
proxy front end, and you also want to have pages be cached, AND have
personalization of pages based on cookies, then it makes perfect sense
to store user options in a cookie, and then for the pages to be cached
taking cookies into account.
There is already a mechanism for caching different variants of a page - 
simply encode the info into the URL. This is supported on all browsers 
and cannot be switched off through user preference (as cookies can). 
Because a mechanism already exists, there isn't much point in changing 
the standard to accomodate a second method to do the same thing.

But you're also fighting with existing websites that use cookies to try 
and track individual requests, and there are a lot of them out there. If 
each different cookie was cached separately, then you're effectively 
caching separate copies of every page, which makes caching a waste of time.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

2004-05-04 Thread Neil Gunton
Graham Leggett wrote:
 There is already a mechanism for caching different variants of a page -
 simply encode the info into the URL. This is supported on all browsers
 and cannot be switched off through user preference (as cookies can).
 Because a mechanism already exists, there isn't much point in changing
 the standard to accomodate a second method to do the same thing.

As I said previously, storing user options in the URL is broken
because following someone else's link to the same website erases your
options. I use this currently on my website, to pass an option for size
of pics (thumbnail, small or large). Every time someone posts a link to
a page on my website on some message board or email, they inevitably
include the whole query string, with whatever option they happen to have
at that moment. So every person who clicks on the link gets their option
overwritten by the pic option of the person who posted the link. I don't
see how anyone could see this as being a good way to do things.
 
 But you're also fighting with existing websites that use cookies to try
 and track individual requests, and there are a lot of them out there. If
 each different cookie was cached separately, then you're effectively
 caching separate copies of every page, which makes caching a waste of time.

I suggested expanding the cookie definition to include a type or
qualifier that could be used to say whether the cookie should be treated
as a param. Using cookies in this way would not put any more load on the
net than at present, if the default cookie behavior was left as it is
now (i.e. with additional qualifier being required in order to have the
cookie taken into account). Using a special cookie or using the URL are
both functionally equivalent as far as information being passed, the
crucial difference being that using a different URL would not erase your
options - they are being passed via cookie.

To emphasize: I am not suggesting that EVERY cookie out there already be
used by caches, but rather that we amend the standard so that certain
cookies CAN be taken into account. This would be very useful, imho.

One could make the argument that more traffic might be generated if
websites started using the cookie qualifier to make ALL cookies be used
by caches (thus ensuring that they would see every click by a
particular user, making tracking all that much easier). However I don't
think this would make any difference in reality, since websites that
want this functionality can already get it by setting the pages to be
no-cache. The cookie qualifier would add the benefit of being able to
cache pages that have the same options set as the same cache entry. 

The addition of a cookie cache qualifier would not break any existing
systems, because the default behavior of cookies remains unchanged. It
would also not put any more load on the net than would be caused by
sites passing options in the URL, since each request with a different
option in the URL would have to be cached differently anyway. We gain
something, and lose nothing, as far as I can tell.

All the best,

-Neil


Re: mod_proxy distinguish cookies?

2004-05-04 Thread Roy T. Fielding
Rather just use URL parameters. As I recall RFC2616 does not consider 
a
request with a different cookie a different variant, so even if you
patch your server to allow it to differentiate between cookies, 
neither
the browsers nor the transparent proxies in the path of the request 
will
do what you want them to do :(
Well, that truly sucks. If you pass options around in params then
whenever someone follows a link posted by someone else, they will
inherit that person's options.
I do wish people would read the specification to refresh their memory
before summarizing.  RFC 2616 doesn't say anything about cookies -- it
doesn't have to because there are already several mechanisms for marking
a request or response as varying.  In this case
   Vary: Cookie

added to the response by the server module (the only component capable
of knowing how the resource varies) is sufficient for caching clients
that are compliant with HTTP/1.1.  Expires and Cache-Control are usually
added as well if HTTP/1.0 caches are a problem.
Roy



Re: mod_proxy distinguish cookies?

2004-05-04 Thread Graham Leggett
Roy T. Fielding wrote:

I do wish people would read the specification to refresh their memory
before summarizing.  RFC 2616 doesn't say anything about cookies -- it
doesn't have to because there are already several mechanisms for marking
a request or response as varying.  In this case
   Vary: Cookie

added to the response by the server module (the only component capable
of knowing how the resource varies) is sufficient for caching clients
that are compliant with HTTP/1.1.
My sentence RFC2616 does not consider a request with a different cookie 
a different variant should have read RFC2616 does not recognise 
cookies specifically at all, as they are just another header. I did not 
think of the Vary case, sorry for the confusion.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

2004-05-03 Thread Neil Gunton
Graham Leggett wrote:
 
 Neil Gunton wrote:
 
  The problem now is that the browsers (IE and Mozilla at least) don't
  seem to differentiate requests based on cookies. I have tested
  requesting a page with a certain cookie (where the page has a sufficient
  expiration to warrant being cached for the duration of the test), and
  then changing the cookie, and re-requesting the same page as before. The
  cookie is different, but the browsers still seem to use their local
  cached copy of the page. So, I am currently thinking that the solution
  to this is to use a combination of cookies and URL parameters to make
  the requests look different.
 
 Rather just use URL parameters. As I recall RFC2616 does not consider a
 request with a different cookie a different variant, so even if you
 patch your server to allow it to differentiate between cookies, neither
 the browsers nor the transparent proxies in the path of the request will
 do what you want them to do :(

Well, that truly sucks. If you pass options around in params then
whenever someone follows a link posted by someone else, they will
inherit that person's options. The only alternative might be to make
pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel
directive (which I haven't tried, but I assume that's what it does)...
so even though my server will get hit a lot more, at least it might be
stopped by the proxy rather than hitting the mod_perl.

From what you are saying, it would appear that HTTP is broken with
regard to cookies and caching. I thought they had all that sorted out a
while back. Never mind...

Thanks for the insight, I'll have to think about this some more it
seems. Either have extremely volatile options via URL params with page
caching, or no caching (outside of my server, which would mean a LOT
more traffic since every time someone hits 'Back' on their browser it
would think it had to re-get the page) and persistent options. Hmmm...

Any other ideas would be welcomed, but right now that's about all I can
think of...

Thanks again,

-Neil


Re: mod_proxy distinguish cookies?

2004-05-03 Thread Graham Leggett
Neil Gunton wrote:

The problem now is that the browsers (IE and Mozilla at least) don't
seem to differentiate requests based on cookies. I have tested
requesting a page with a certain cookie (where the page has a sufficient
expiration to warrant being cached for the duration of the test), and
then changing the cookie, and re-requesting the same page as before. The
cookie is different, but the browsers still seem to use their local
cached copy of the page. So, I am currently thinking that the solution
to this is to use a combination of cookies and URL parameters to make
the requests look different.
Rather just use URL parameters. As I recall RFC2616 does not consider a 
request with a different cookie a different variant, so even if you 
patch your server to allow it to differentiate between cookies, neither 
the browsers nor the transparent proxies in the path of the request will 
do what you want them to do :(

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

2004-04-26 Thread Graham Leggett
Igor Sysoev wrote:

mod_accel ( http://sysoev.ru/en/ ) allows to take cookies into account while
caching:
AccelCacheCookie  some_cookie_name another_cookie_name

You can set it on per-location basis.

Besides, my upcoming light-weight http and reverse proxy server nginx
will allow to do it too.
Double check first whether this is allowed by RFC2616 - remember that 
the Apache mod_proxy is very unlikely to be the only proxy in the chain, 
so even if mod_proxy takes cookies into account, other caches in the 
chain might not.

Also, it is unlikely that changes will be made to the v1.3 tree code, it 
is more likely that such a feature might be found in the mod_cache 
modules of Apache v2.0.

The original design of mod_cache allowed for the caching of different 
variants of the same URL (different variants might be different 
languages of the same page, etc), though I am not sure if that feature 
currently works. If it does, that would be what you need.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

2004-04-25 Thread Igor Sysoev
On Sat, 24 Apr 2004, Neil Gunton wrote:

 Neil Gunton wrote:
 
  Hi all,
 
  I apologise in advance if this is obvious or otherwise been answered
  elsewhere, but I can't seem to find any reference to it.
 
  I am using Apache 1.3.29 with mod_perl, on Linux 2.4. I am running
  mod_proxy as a caching reverse proxy front end, and mod_perl on the
  backend. This works really well, but I have noticed that mod_proxy does
  not seem to be able to distinguish requests as being different if the
  URLs are the same, but they contain different cookies. I would like to
  be able to enable more personalization on my site, which would best be
  done using cookies. The problem is that when a page has an expiration
  greater than 'now', then any request to the same URL will get the cache
  version, even if the requests have different cookies. Currently I have
  to pass options around as part of the URL in order to make the requests
  look different to mod_proxy.
 
  Am I missing something here? Or, will this be included in either future
  versions of mod_proxy or the equivalent module in Apache 2.x? Any
  insights greatly appreciated.

 I should perhaps make clear that I do have cookies working through the
 proxy just fine, for pages that are set to be 'no-cache'. So this isn't
 an issue with the proxy being able to pass cookies to/from the backend
 and browser (which I think I have seen mentioned before as a bugfix),
 but rather with mod_proxy simply being able to distinguish otherwise
 identical URL requests that have different cookies, and cache those as
 different requests.

 So for example, the request GET /somedir/somepage.html?xxx=yyy passed
 with a cookie that value 'pics=small' should be seen as different from
 another identical request, but with cookie value 'pics=large'. Currently
 my tests indicate that mod_proxy returns the same cached page for each
 request.

 I assume that mod_proxy only checks the actual request string, and not
 the HTTP header which contains the cookie.

 Obviously, under this scheme, if you were using cookies to track
 sessions then all requests would get passed to the backend server - so,
 perhaps it would be a nice additional feature to be able to configure,
 through httpd.conf, how mod_proxy (or its successor) pays attention to
 cookies. For example, you might say something to the effect of ignore
 this cookie or differentiate requests using this cookie. Then we
 could have sitewide options like e.g. 'pics' (to set what size pictures
 are shown), and this could be used to distinguish cached pages, but
 other cookies might be ignored on some pages. This would allow for more
 flexibility, with some cached pages being sensitive to cookies, while
 others are not. An obvious way this would be useful is in the use of
 login cookies. These will be passed in by the browser for every page on
 the site, but this doesn't mean we want to distinguish cached pages
 based on it for every page. Some user-specific pages would have
 'no-cache' set, while other pages could be set to ignore this login
 cookie, thus gaining the benefits of the proxy caching. This would be
 useful for pages that have no user-specific or personalizable aspects -
 they could be cached regardless of who is logged in.

 Sorry if this wasn't clear from the original post, just wanted to
 clarify and expand... any advice on this would be VERY welcomed, since
 my options with personalization are currently rather limited.

 Also, if this is actually addressed to the wrong list for some reason
 then a pointer would be much appreciated...

mod_accel ( http://sysoev.ru/en/ ) allows to take cookies into account while
caching:

AccelCacheCookie  some_cookie_name another_cookie_name

You can set it on per-location basis.

Besides, my upcoming light-weight http and reverse proxy server nginx
will allow to do it too.


Igor Sysoev
http://sysoev.ru/en/


Re: mod_proxy distinguish cookies?

2004-04-24 Thread Neil Gunton
Neil Gunton wrote:
 
 Hi all,
 
 I apologise in advance if this is obvious or otherwise been answered
 elsewhere, but I can't seem to find any reference to it.
 
 I am using Apache 1.3.29 with mod_perl, on Linux 2.4. I am running
 mod_proxy as a caching reverse proxy front end, and mod_perl on the
 backend. This works really well, but I have noticed that mod_proxy does
 not seem to be able to distinguish requests as being different if the
 URLs are the same, but they contain different cookies. I would like to
 be able to enable more personalization on my site, which would best be
 done using cookies. The problem is that when a page has an expiration
 greater than 'now', then any request to the same URL will get the cache
 version, even if the requests have different cookies. Currently I have
 to pass options around as part of the URL in order to make the requests
 look different to mod_proxy.
 
 Am I missing something here? Or, will this be included in either future
 versions of mod_proxy or the equivalent module in Apache 2.x? Any
 insights greatly appreciated.

I should perhaps make clear that I do have cookies working through the
proxy just fine, for pages that are set to be 'no-cache'. So this isn't
an issue with the proxy being able to pass cookies to/from the backend
and browser (which I think I have seen mentioned before as a bugfix),
but rather with mod_proxy simply being able to distinguish otherwise
identical URL requests that have different cookies, and cache those as
different requests.

So for example, the request GET /somedir/somepage.html?xxx=yyy passed
with a cookie that value 'pics=small' should be seen as different from
another identical request, but with cookie value 'pics=large'. Currently
my tests indicate that mod_proxy returns the same cached page for each
request.

I assume that mod_proxy only checks the actual request string, and not
the HTTP header which contains the cookie.

Obviously, under this scheme, if you were using cookies to track
sessions then all requests would get passed to the backend server - so,
perhaps it would be a nice additional feature to be able to configure,
through httpd.conf, how mod_proxy (or its successor) pays attention to
cookies. For example, you might say something to the effect of ignore
this cookie or differentiate requests using this cookie. Then we
could have sitewide options like e.g. 'pics' (to set what size pictures
are shown), and this could be used to distinguish cached pages, but
other cookies might be ignored on some pages. This would allow for more
flexibility, with some cached pages being sensitive to cookies, while
others are not. An obvious way this would be useful is in the use of
login cookies. These will be passed in by the browser for every page on
the site, but this doesn't mean we want to distinguish cached pages
based on it for every page. Some user-specific pages would have
'no-cache' set, while other pages could be set to ignore this login
cookie, thus gaining the benefits of the proxy caching. This would be
useful for pages that have no user-specific or personalizable aspects -
they could be cached regardless of who is logged in.

Sorry if this wasn't clear from the original post, just wanted to
clarify and expand... any advice on this would be VERY welcomed, since
my options with personalization are currently rather limited.

Also, if this is actually addressed to the wrong list for some reason
then a pointer would be much appreciated...

Thanks again,

-Neil