Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Mark Nottingham
On 02/06/2010, at 9:00 AM, toki...@aol.com wrote:

  Sergey wrote...
  That's new to me that browsers don't cache stuff that has Vary only on 
  Accept-Encoding - can you post some statistics or describe the test you ran?
 
 Test results and statistics...
 
 Apache DEV forum...
 http://www.pubbs.net/200908/httpd/55434-modcache-moddeflate-and-vary-user-agent.html

I don't see anything there but anecdotal evidence, certainly no reproducible 
tests.

 apache-modgzip forum...
 http://marc.info/?l=apache-modgzipm=103958533520502w=2

Seven and a half years old, and again anecdotal.

 Etc, etc. Lots of discussion about this has taken place over
 on the SQUID forums as well.

Yes; most of it in the past few years surrounding the ETag bug in Apache, not 
browser bugs. 

Regards,


--
Mark Nottingham http://www.mnot.net/



Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread tokiley

 Mark Nottingham wrote...

 On 02/06/2010, at 9:00 AM, toki...@aol.com wrote:

   Sergey wrote...
   That's new to me that browsers don't cache stuff that has Vary only on 
   Accept-Encoding - can you post some statistics or describe the test you 
   ran?
  
  Test results and statistics...
  
  Apache DEV forum...
  http://www.pubbs.net/200908/httpd/55434-modcache-moddeflate-and-vary-user-agent.html

 I don't see anything there but anecdotal evidence

I think you need to do a reboot on your definition of 'anecdotal'.

The thread above was a focused discussion about what ACTUALLY
happens if you try to 'Vary:' on 'User-Agent' in the real world
these days accompanied by some additional (relevant) information about
what COULD (actually) happen if you (alternatively) try to 'Vary:' on
'Accept-encoding:'. If you still think any of it 'lacks veracity'
and is 'not trustworthy' then my only suggestion would be to spend
a little time on Google or Bling. It's an ongoing 'story'.

 certainly no reproducible tests.

What sort of tests would you like to see?

Anyone with access to certain browsers can 'reproduce'
the reported results.

 apache-modgzip forum...
 http://marc.info/?l=apache-modgzipm=103958533520502w=2

 Seven and a half years old, 

Yea. Wow. Boggles the mind that it's still relevant, doesn't it?

 and again anecdotal.

See above regarding use/misuse of the word 'anecdotal'. 

The tests (described in the link) were done using a kernel debugger
and a lot of those (unpatched) browsers are still in use TODAY.
I've heard kernel debuggers called a lot of things but 'anecdotal'
is not on the list.

 Etc, etc. Lots of discussion about this has taken place over
 on the SQUID forums as well.

 Yes; most of it in the past few years surrounding the ETag bug in Apache, not 
 browser bugs. 

 Regards,
 Mark Nottingham

The 2.5 release of SQUID ( Early 2004 ) was the very FIRST version of that 
Proxy Server that made any attempt to handle 'Vary:' headers at all. Prior to
that, they were just doing the same thing all the browsers would. If a 'Vary:'
header of ANY description arrived in the stream, it was simply treated as if
it was 'Vary: *' ( STAR ) and there was no attempt to cache it at all.

There was a huge discussion about ALL of this in late December of 2003 
over in SQUID land as they were trying to get 2.5 out the door.

I believe, at that time, it was Robert Collins who got 'tagged' to do the 
'Vary:' part and Henrik Nordstrom took the whole 'ETag' part on his shoulders.

If you Google 'Vary Accept-Encoding Browsers SQUID' but also include
Robert Collins name you'll find more than if you use 'Henrik's' name
since he was ultra-focused on the ETag thing. ( He still is ).

Regardless, they were both VERY MUCH interested in the 'Browser bugs' 
surrounding all of this since they both realized that SQUID was going to 
'take the heat' if/when the whole 'Vary:' scheme came alive and things suddenly 
got weird 'on the last mile'.

In the end, they did a good job implementing 'Vary:' in SQUID 2.5 but it
really has been an ongoing 'adventure' that continues to this very day.

Only about 12 months ago one of the SQUID User's forum lit up with another
'discovered' problem surrounding all this 'Vary:' stuff and this had
to do with non-compliance on the actual 'Accept-Encoding:' fields
themselves coming from Browsers/User-Agents. ( Browser BUGS ).
In some cases the newly discovered problem reflects the same nightmare 
seen TODAY with the out-of-control use of 'User-Agent'. 

Too many variants being generated.

Squid User's Forum...
http://www.pubbs.net/200904/squid/57482-re-squid-users-strange-problem-regarding-accept-encoding-and-compression-regex-anyone.html

Here's just a sampling of what was being shown from REAL WORLD
Server logs just 12 months ago... 

Accept-Encoding: , FFF
Accept-Encoding: mzip, meflate
Accept-Encoding: identity, deflate, gzip
Accept-Encoding: gzip;q=3D1.0, deflate;q=3D0.8, chunked;q=3D0.6,
identity;q=3D0.4, *;q=3D0
Accept-Encoding: gzip, deflate, x-gzip, identity; q=3D0.9
Accept-Encoding: gzip,deflate,bzip2
Accept-Encoding: ndeflate
Accept-Encoding: x-gzip, gzip
Accept-Encoding: gzip,identity
Accept-Encoding: gzip, deflate, compress;q=3D0.9
Accept-Encoding: gzip,deflate,X.509
Yada, yada, yada...

To this day... not even Firefox and MSIE 7 'do the same thing'
with regards to this header. Though SEMANTICALLY identical... the 
following is STILL causing some problems for people that weren't
tickety-boo with their parsing code...

Firefox sends this...
Accept-Encoding: gzip,deflate

MSIE sends this...
Accept-Encoding: gzip, deflate

That one even bit the SQUID folks in the butt for a couple of revisions and
they are STILL trying to arrive at the best 'normalization' parsing for 
this sort of thing.

Here's a thread from less than 60 days ago detailing this 'How do we normalize
this Accept-Encoding stuff?' issue with SQUID...

Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Mark Nottingham

On 04/06/2010, at 6:51 PM, toki...@aol.com wrote:
 
 I think you need to do a reboot on your definition of 'anecdotal'.

Good for you.


 The thread above was a focused discussion about what ACTUALLY
 happens if you try to 'Vary:' on 'User-Agent' in the real world
 these days accompanied by some additional (relevant) information about
 what COULD (actually) happen if you (alternatively) try to 'Vary:' on
 'Accept-encoding:'. If you still think any of it 'lacks veracity'
 and is 'not trustworthy' then my only suggestion would be to spend
 a little time on Google or Bling. It's an ongoing 'story'.

I'm not sure why you're using so many quotes, unless you're trying to put words 
into my mouth. Please stop.


  certainly no reproducible tests.
 
 What sort of tests would you like to see?

Ones that can be reproduced. Preferably in an automated fashion, or at least 
with demonstrable proof. Waving around the phrase kernel debugger doesn't 
count.


 The 2.5 release of SQUID ( Early 2004 ) was the very FIRST version of that 
 Proxy Server that made any attempt to handle 'Vary:' headers at all. Prior to
 that, they were just doing the same thing all the browsers would. If a 'Vary:'
 header of ANY description arrived in the stream, it was simply treated as if
 it was 'Vary: *' ( STAR ) and there was no attempt to cache it at all.

What's your point? The deployment footprint of 2.4 is vanishingly small, given 
that it had a LOT of bugs, hasn't been supported for years, and still uses 
select/poll. 


 If you Google 'Vary Accept-Encoding Browsers SQUID' but also include
 Robert Collins name you'll find more than if you use 'Henrik's' name
 since he was ultra-focused on the ETag thing. ( He still is ).

Yes, as am I, and Roy for that matter, last time I talked to him about it.


 Only about 12 months ago one of the SQUID User's forum lit up with another
 'discovered' problem surrounding all this 'Vary:' stuff and this had
 to do with non-compliance on the actual 'Accept-Encoding:' fields
 themselves coming from Browsers/User-Agents. ( Browser BUGS ).
 In some cases the newly discovered problem reflects the same nightmare 
 seen TODAY with the out-of-control use of 'User-Agent'. 

It's not a bug in the implementations, it's a grey area in 2616 that HTTPbis 
has since worked to resolve; 
  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/147


 Too many variants being generated.
 
 Squid User's Forum...
 http://www.pubbs.net/200904/squid/57482-re-squid-users-strange-problem-regarding-accept-encoding-and-compression-regex-anyone.html
 
 Here's just a sampling of what was being shown from REAL WORLD
 Server logs just 12 months ago... 
 
 Accept-Encoding: , FFF
 Accept-Encoding: mzip, meflate
 Accept-Encoding: identity, deflate, gzip
 Accept-Encoding: gzip;q=3D1.0, deflate;q=3D0.8, chunked;q=3D0.6,
 identity;q=3D0.4, *;q=3D0
 Accept-Encoding: gzip, deflate, x-gzip, identity; q=3D0.9
 Accept-Encoding: gzip,deflate,bzip2
 Accept-Encoding: ndeflate
 Accept-Encoding: x-gzip, gzip
 Accept-Encoding: gzip,identity
 Accept-Encoding: gzip, deflate, compress;q=3D0.9
 Accept-Encoding: gzip,deflate,X.509
 Yada, yada, yada...

Yes, yes, but in the REAL WORLD (as you like to say), there are only a few 
common browser families, and there is a high degree of similarity within those 
families. Caches may see some duplication, but the replacement algorithms will 
generally do the right thing. In the meantime, we'll fix header normalisation 
in Squid, Traffic Server and other caches.

I'm not necessarily agreeing with those who say that GZIP should be turned on 
by default in Apache now, but I hate to see the argument against it made with 
so many shoddy straw-men.


 People get REALLY PISSED these days when everything was running along
 just fine and suddenly there are 'problems'. Heads can roll.

Why don't you just shout BOO and get it over with?

*shakes head*



--
Mark Nottingham http://www.mnot.net/



Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Brian Pane
On Fri, Jun 4, 2010 at 2:18 AM, Mark Nottingham m...@mnot.net wrote:
[...]
 It's not a bug in the implementations, it's a grey area in 2616 that HTTPbis 
 has since worked to resolve;
  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/147

By my reading of the attachments in that ticket, servers (including caches)
would be required to treat the following as equivalent to each other:
Accept-Encoding: gzip, deflate
Accept-Encoding: gzip,deflate
Accept-Encoding: deflate, gzip
Accept-Encoding: deflate,gzip
and the following as different from each other:
Accept-Encoding: gzip,deflate
Accept-Encoding: gzip

If so, the RFC 2616 patch would basically codify current good
practices in cache configuration (I recall that the Varnish docs, for
example, recommend normalizing the Accept-Encoding before using it in
a cache key), and as such it would be a step forward.

In practice, given a cache that implements these equivalence rules and
an origin server that sets a Vary header on Content-Encoding, I'd
expect the cache to end up holding up to three copies of each object:
1. compressed, with a cache key of something like URI+gzip,deflate
2. compressed, with a cache key of URI+gzip
3. uncompressed, with a cache key of URI+

That's fewer copies of the object that the cache would end up with if
it did a strict text match on the different permutations of
gzip,deflate; but it's still a lot of copies.

So I have to ask: why not reduce the number of copies to just one, by
turning Content-Encoding into a hop-by-hop header and deprecating the
use of Vary to indicate an Accept-Encoding-based variation?  Cache
implementors could then choose their own policies:
- Store one copy of the object, compressed, to optimize for memory use
- Store compressed and uncompressed copies of the object, to optimize
for CPU use

Brian


RE: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Plüm, Rüdiger, VF-Group
 

 -Original Message-
 From: Brian Pane [mailto:brianp...@gmail.com] 
 Sent: Freitag, 4. Juni 2010 14:39
 To: dev@httpd.apache.org
 Subject: Re: canned deflate conf in manual -- time to drop 
 the NS4/vary?
 
 On Fri, Jun 4, 2010 at 2:18 AM, Mark Nottingham m...@mnot.net wrote:
 [...]
  It's not a bug in the implementations, it's a grey area in 
 2616 that HTTPbis has since worked to resolve;
   http://trac.tools.ietf.org/wg/httpbis/trac/ticket/147
 
 By my reading of the attachments in that ticket, servers 
 (including caches)
 would be required to treat the following as equivalent to each other:
 Accept-Encoding: gzip, deflate
 Accept-Encoding: gzip,deflate
 Accept-Encoding: deflate, gzip
 Accept-Encoding: deflate,gzip
 and the following as different from each other:
 Accept-Encoding: gzip,deflate
 Accept-Encoding: gzip
 
 If so, the RFC 2616 patch would basically codify current good
 practices in cache configuration (I recall that the Varnish docs, for
 example, recommend normalizing the Accept-Encoding before using it in
 a cache key), and as such it would be a step forward.
 
 In practice, given a cache that implements these equivalence rules and
 an origin server that sets a Vary header on Content-Encoding, I'd
 expect the cache to end up holding up to three copies of each object:
 1. compressed, with a cache key of something like URI+gzip,deflate
 2. compressed, with a cache key of URI+gzip
 3. uncompressed, with a cache key of URI+
 
 That's fewer copies of the object that the cache would end up with if
 it did a strict text match on the different permutations of
 gzip,deflate; but it's still a lot of copies.
 
 So I have to ask: why not reduce the number of copies to just one, by
 turning Content-Encoding into a hop-by-hop header and deprecating the

Isn't that what Transfer-Encoding is designed for?

Regards

Rüdiger



Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Brian Pane
On Fri, Jun 4, 2010 at 6:10 AM, Plüm, Rüdiger, VF-Group
ruediger.pl...@vodafone.com wrote:
[...]
 Isn't that what Transfer-Encoding is designed for?

Yes, and in fact if we were talking about a brand new protocol, I'd
probably argue in favor of putting the compression specifier in the
Transfer-Encoding.  I think a change in the semantics of
Content-Encoding in HTTP/1.1 might be a way to obtain similar benefits
without breaking existing software.

-Brian


Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-04 Thread Mark Nottingham
Changing the semantics of Accept-Encoding / Content-Encoding is likely out of 
scope for HTTPbis; I have a hard time believing it wouldn't make existing 
implementations non-conformant, which we can really only do if there's a 
serious security or interoperability concern.

OTOH I think it would be reasonably easy to change Squid and other 
intermediaries to pass through TE; i.e., if the client asks for hop-by-hop 
compression, ask the server for hop-by-hop compression as well (so they don't 
have to dynamically decompress and possibly buffer responses for clients that 
don't support it). 

The question would be whether any reasonable number of browsers would start 
sending TE. Given that both Chrome and FF are on a perf kick these days, I 
think it's possible. The problem with hop-by-hop compression has always been 
that no-one else does it... if Apache were to start, that would be a step.

Interestingly, it appears that Mozilla had partial support at one time:
  http://www-archive.mozilla.org/projects/apache/gzip/

 Here we hope to use the new HTTP1.1 TE: gzip header to request compressed 
 versions of HTML files. Then the server would need to do streaming 
 compression to generate the results. To minimize the overhead on the server 
 it should keep a cache of the compressed files to quickly fill future 
 requests for the same compressed data.
 
 The current Mozilla source can already accept and decode Transfer-encoding: 
 gzip data, but does not currently send the TE: header.

but I can't find the corresponding code in the current Mozilla source.

Regards,


On 04/06/2010, at 11:36 PM, Brian Pane wrote:

 On Fri, Jun 4, 2010 at 6:10 AM, Plüm, Rüdiger, VF-Group
 ruediger.pl...@vodafone.com wrote:
 [...]
 Isn't that what Transfer-Encoding is designed for?
 
 Yes, and in fact if we were talking about a brand new protocol, I'd
 probably argue in favor of putting the compression specifier in the
 Transfer-Encoding.  I think a change in the semantics of
 Content-Encoding in HTTP/1.1 might be a way to obtain similar benefits
 without breaking existing software.
 
 -Brian


--
Mark Nottingham http://www.mnot.net/



Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-01 Thread Sergey Chernyshev
Yeah, it should only Vary on Accept-encoding (already does). It's still not
perfect, but at least it doesn't blow up proxies too much.

The question to people with statistics - are there any other issues with
gzip/proxy configurations?

Sergey


On Tue, Jun 1, 2010 at 11:01 AM, Eric Covener cove...@gmail.com wrote:

 IIUC, the vary: user-agent to accomodate Netscape 4 is a pain for
 caches because obviously they can only vary on the entire user-agent.

 http://httpd.apache.org/docs/2.2/mod/mod_deflate.html

 Is it time to move this aspect of the snippet into a separate note or
 some historical trivia section, to remove the Vary?

 --

 On the same topic, are there still non-academic CSS and JS compression
 issues (e.g. XP-era browsers, earlier, later, ???)  Should we instead
 account for these in the complicated/more compression example, and
 is there a way to do so without adding the Vary right back in?


 --
 Eric Covener
 cove...@gmail.com



Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-01 Thread tokiley

 Don't forget the ongoing issue that if you ONLY vary on 'Accept-Encoding'
then almost ALL browsers will then refuse to cache a response entity LOCALLY
and the pain factor moves directly to the Proxy/Content Server(s).

If you vary on 'User-Agent' ( No longer reasonable because of the abuse
of that header 'out there'? ) then the browsers WILL cache responses
locally and the pain is reduced at the Proxy/Content server level, but
pie is not free at a truck stop and there are then OTHER issues to deal with.

The OTHER 'ongoing issue' regarding compression is that, to this day,
it still ONLY works for a limited set of MIME types. The 'Accept-Encoding: 
gzip,deflate'
header coming from ALL major browser is still mostly a LIE. It would seem
to indicate that the MIME type doesn't matter and it will 'decode' for ANY 
MIME type but nothing could be further from the truth. There is no browser on 
the 
planet that will 'Accept-Encoding' for ANY/ALL mime type(s).

If you are going to turn compression ON by default, without the user having to
make any decisions for their particular environment, then part of the decision
for the default config has to be 'Which MIME types?'  text/plain and/or
text/html only? SOME browsers can 'Accept-Encoding' on the ever-increasing
.js Javascript backloads but some CANNOT.

These 2 issues alone are probably enough to justify keeping compression 
OFF by default. A lot of people that use Apache won't even be able to get 
their heads around either one of these 'issues' and they really SHOULD
do a little homework before turning it ON.

Someone already quoted that...

'people expect the default config to just WORK without major issues'.

That's exactly what you have now.
It's not 'broken'.
Why change it?

Kevin Kiley



 

-Original Message-
From: Sergey Chernyshev sergey.chernys...@gmail.com
To: dev@httpd.apache.org
Sent: Tue, Jun 1, 2010 3:09 pm
Subject: Re: canned deflate conf in manual -- time to drop the NS4/vary?


Yeah, it should only Vary on Accept-encoding (already does). It's still not 
perfect, but at least it doesn't blow up proxies too much.


The question to people with statistics - are there any other issues with 
gzip/proxy configurations?


Sergey



On Tue, Jun 1, 2010 at 11:01 AM, Eric Covener cove...@gmail.com wrote:

IIUC, the vary: user-agent to accomodate Netscape 4 is a pain for
caches because obviously they can only vary on the entire user-agent.

http://httpd.apache.org/docs/2.2/mod/mod_deflate.html

Is it time to move this aspect of the snippet into a separate note or
some historical trivia section, to remove the Vary?

--

On the same topic, are there still non-academic CSS and JS compression
issues (e.g. XP-era browsers, earlier, later, ???)  Should we instead
account for these in the complicated/more compression example, and
is there a way to do so without adding the Vary right back in?


--
Eric Covener
cove...@gmail.com



 


Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-01 Thread Nick Kew
On Tue, 01 Jun 2010 17:44:41 -0400
toki...@aol.com wrote:

 
  Don't forget the ongoing issue that if you ONLY vary on 'Accept-Encoding'
 then almost ALL browsers will then refuse to cache a response entity LOCALLY

Really?  That sounds bizarre!  Do you have a reference for it?

-- 
Nick Kew


Re: canned deflate conf in manual -- time to drop the NS4/vary?

2010-06-01 Thread tokiley
 Sergey wrote...
 That's new to me that browsers don't cache stuff that has Vary only on 
 Accept-Encoding - can you post some statistics or describe the test you ran?

Test results and statistics...

Apache DEV forum...
http://www.pubbs.net/200908/httpd/55434-modcache-moddeflate-and-vary-user-agent.html

apache-modgzip forum...
http://marc.info/?l=apache-modgzipm=103958533520502w=2

Etc, etc. Lots of discussion about this has taken place over
on the SQUID forums as well.

 As for *all* content types, I don't think we're talking about compressing 
 images 
 and it's relatively easy to create a white-list to have gzip on for by 
 default.

Apache's own mod_deflate docs show how to exclude images.
That's a no-brainer.
It's the OTHER mime types that get hairy.

 The question regarding support in browsers actually is very serious too and 
 I'd love 
 to see statistics for that too - it sounds too scary and middle-ages to me.

You must be new to this sort of thing.
See links above and read the MANY related threads on the SQUID forum.

 I didn't get this impression from all the talks about gzip and research 
 that guys from Google did, for example, when they were looking for a source 
 of 
 lower gzip rates (it turned out to be antivirus software stripping 
 Accept-encoding headers).

I think I know the Google R/D you are referring to and it was almost a joke.
There was a LOT of research they did NOT do and they made many assumptions
that are simply NOT TRUE in the REAL WORLD.

 Thank you,
 Sergey

You're welcome
Kevin