Hello all...
Kevin Kiley here...
Here is a mixture of comment/response regarding mod_gzip and the
ongoing conversation(s)...
There is a (short) SUMMARY at the bottom.
Justin ErenKrantz's original post...
> Ian has posted his mod_gz filter before, now I'd like to give it a +1.
>
> I told him I'd look at it a while ago, but never got a chance to do
> so. So, I spent this morning cleaning up the configuration and a bit
> of the code to fit our style (nothing major).
>
> I'd like to add this to the modules/filters directory (which seems
> like the most appropriate place).
>
> Justin
Ryan Bloom responded...
> I have a few problems with this. 1) We have consistantly declined to
> accept the mod_Gzip from remote communications.
Correct... and for a while the reason was because everyone thought the
module was using ZLIB and there has been a long standing aversion to
including ANY version of GNU ZLIB ( or any other GNU stuff ) into the
Apache tree. We have personal emails from your board members stating
that to be the case. If that aversion has evaporated then there is a TON of
GNU based stuff that is now 'eligible' for inclusion in the core
distribution, right?
GNU issues aside... we have consistently been told that the other reason
mod_gzip would not be included is because of the 'support' issue. Apache
has a standing rule that no module goes into the distribution unless you
are absolutely SURE that it will be adequately supported. Makes sense.
We have been consistently supporting this technology for years now.
Ian himself said he 'Just got bored and decided not to do any real
work one weekend' and cranked out a filter demo that happened to
include ZLIB. He has not demonstrated any intention of actually
'supporting' it other than making a few modifications to the 'demo'
( and even those have yet to appear ).
If that doesn't raise the issue of 'will it be supported' I don't
know what would.
mod_gzip has NEVER used 'ZLIB' for a number of reasons... the primary
one being that ZLIB is inadequate for the purpose. ZLIB is a
non-multithread-safe
file based compression library. The secondary reason is actually so
that you folks wouldn't have to worry about the 'GNU' issues you have if/when
you wanted to use the code. Read the comments at the top of mod_gzip.c
right underneath the Apache license section which has always been at
the top of mod_gzip and the syntax of which was personally approved
by at least 3 of your top level board members via personal email(s).
> 2) I keep hearing that zlib has more memory leaks than a sieve.
It does. Any non-multithread-safe file oriented library does if you
just slap it into a multithread environment and start using it without
putting critsecs around the calls to serialize it.
> 3) I don't believe that we
> should be adding every possible module to the core distribution.
That's always been the rule. It is still a mystery when/how something
'rises' to the level of importance to be included in the Apache
distribution ( E.g. WebDav, etc ). That being said... see next comment.
> I personally think we should leave the core as minimal as possible, and
> only add more modules if they implement a part of the HTTP spec.
mod_gzip does that. It makes Apache able to perform the IETF Content-Encoding
specification that has ALWAYS been a part of HTTP 1.1.
> Before this goes into the main tree, I think we need to seriously think
> about those topics.
Think HTTP 1.1 compliance.
> I would be MUCH happier if this was a sub-project, or just a module
> that Ian distributed on his own.
>
> Ryan
Cliff Wooley wrote...
>> Ryan Bloom wrote...
>>
>> I have a few problems with this. 1) We have consistantly declined to
>> accept the mod_Gzip from remote communications.
>
> That's true, though that was for 1.3. Just now with Peter's message is
> the first time I've heard that mod_gzip for 2.0 was even nearing release.
> I'm not prejudiced... whichever one is better wins. :)
We have said any number of times that even the ORIGINAL version of mod_gzip
was coded/tested against the ORIGINAL (alpha) release of Apache 2.0. It was
only when we realized how long Apache 2.0 was away from either a beta or
a GA that we ported it BACKWARDS to Apache 1.3.x so people could start using
it right away... and they have ( thousands of folks ).
As recently as a few weeks ago we said again that a 2.0 version of mod_gzip
was 'ready to go' but we just wanted to make sure the filtering API's were
going to stop changing ( which they still were at the time ).
Now our only concern is that the filtering I/O is actually WORKING
the way it is supposed to from top to bottom. Even recent messages
regarding Content-length support and such indicate there is still
some work to be done. We just want the existing Apache 2.0 I/O
scheme to be CERTIFIED by the people that wrote it ( E.g. BETA at least )
before we CERTIFY our own product(s) against that same Server product.
If you were conviced that mod_gzip was only for Apache 1.3.x then you
haven't been reading the forum closely.
> I don't suppose the guys at Remote Communications would be willing to
> share the code for mod_gzip for 2.0 with some of us developers privately
> so that we can get a feel for it before it's released publicly, would
> they?
The moment we are sure that the actual Apache Web Server you are trying to
use to compress responses is stable and actually able to do the IETF
Content-encoding without screwing up the responses we will release the code.
We would prefer GA but decided that, itself, is so far off that we will
settle for at least 1 known good BETA.
That being said... if you REALLY are frothing at the mouth to see what is
essentially the same code that's already in the public domain modified
to use Apache 2.0 filtering ( if available ) then email me and I will
probably send it to you.
If you look at the existing mod_gzip.c which you can now download from
thousands of places and you simply use your imagination and substitue
standard filtering calls at the I/O points then that's about all there
is to it.
> > 2) I keep hearing that zlib has more memory leaks than a sieve.
> Maybe it does, but that can be dealt with.
It does... and it has ALREADY been dealt with. See the compression
code in the existing mod_gzip.c source file ( There is only one single
source module for mod_gzip. EVERYTHING that is needed to perform
dynamic IETF Content-Encoding is in mod_gzip.c. You do NOT need ZLIB ).
Why in God's name Ian didn't just use the multi-thread-safe compressor
that's already in the publicly available mod_gzip in his weekend-warrior
filtering demo is a mystery.
> Even so, it shouldn't be a
> consideration here IMO, at least not yet. If it really is leaky (which it
> very well might be but we should prove it rather than just going on what
> we heard), then it's a consideration, but it'd be better for somebody to
> just *fix* zlib than for us to throw out both mod_gz and mod_gzip because
> of zlib's deficiencies (assuming we care, which we probably do).
FWIW... we have such a 'fixed' version of ZLIB.
( See comments elsewhere about our work with some of the major
Internet testing companies and how a 'fixed' version of ZLIB
became a REQUIREMENT ).
Also FWIW... We already have a relationship with Dr. Mark Adler
( One of the original author's of ZLIB and GZIP ).
> > 3) I don't believe that we should be adding every possible module to
> > the core distribution. I personally think we should leave the core as
> > minimal as possible, and only add more modules if they implement a
> > part of the HTTP spec.
>
> My personal opinion is that this one is important enough that it should go
> in. Most clients support gzip transfer coding, and it's a very real
> solution to the problem of network bandwidth being the limiting factor on
> many heavily-loaded web servers and on thin-piped clients (read: modem
> users).
Amen.
> mod_gz(ip) could provide a significant throughput improvement in
> those cases, where the CPU is twiddling its thumbs while the network pipe
> is saturated.
Of course. This has always been the case.
See the mod_gzip forum. There are thousands of messages to this effect.
> This fills a gap in Apache that could be a very big deal to
> our users.
That's what we have been trying to say for over 3 years now and we have
proved it with mod_gzip for Apache 1.3.x. Once most Apache ( and other
Web server ) admins see it in action they would never consider NOT using it.
> Cliff
Justin Erenkrantz wrote...
> mod_gzip implements the gzip algorithm. It also happens to be a 300k
> source file (~11,000 lines). mod_gz is a 14k file and is 446 lines
> and relies on zlib.
Comparing mod_gzip with Ian's filtering 'demo' as to 'number of lines'
is just plain silly...
First... more than HALF of the mod_gzip.c source code module is nothing
but very detailed 'storytelling' style DEBUG code. If turned on that
'storytelling' debug code tells you more than you ever wanted to know
about not only the IETF Content-Encoding decision making process but the
inner workings of Apache itself.
I removed all of that once everyone reported it as working fine but a
lot of people shot right back and said "Please put the great 'storytelling'
debug back in... it wasn't until I read the mod_gzip debug output that
I finally understood exactly how an Apache module really works!"
Regardless of people saying they are using the mod_gzip 'storytelling'
debug as an actual Apache-internals tutorial... it simply doesn't need
to be there. mod_gzip has been stable for quite some time.
Second... mod_gzip has an Item-mapping database in it that, if you study
the issues, you will discover is the minimum that is needed to fully
support the IETF Content-Encoding standard in the REAL WORLD. I would
say that if you remove the debug code... then again more than half the
code is simply the required 'Item mapping' support and the support for
all of the other 'critical' configuration directives associated with
IETF Content-Encoding support.
If you seriously think that the 'simpler' mod_gz is 'ready' to fully support
IETF Content-encoding then you are smoking something.
If you simply add the actual 'minimal' code to support IETF Content-Encoding
in the real world to mod_gz the code is going to ballon. No question.
Also... in all fairness... if you were to include the actual ZLIB
headers and the 32 source code support files it needs it to the
'line count' for mod_gz you will discover that mod_gz is already
about 4 times LARGER than mod_gzip even with no real-world 'decison making'
support code. mod_gzip is 1 single source file which includes
all the compression support you need 'built-in'. You don't need
ZLIB anywhere on your computer to compile/use mod_gzip.c.
> Knowing the people on this list I will bet that the size of the file
> went a long way for us not accepting Remote Communications's version
> in the core distribution. My cause for not accepting mod_gzip would
> be that implementing the gzip algorithm is better left to someone
> else - like the guy who wrote gzip. I mean no offense to Remote
> Communications as I'm sure their implementation is sound.
No offense taken.
We knew it was sound a few years ago and you are late in the day
if you think we need you to verify it. See the mod_gzip forum.
> I will accept the most concise and correct version - at this time,
> Ian's version is the only one being offered.
Where's the fire?
If I take what you are saying correctly... you think that your
personal need to compress some bloated HTML files TODAY with a Server
codebase that hasn't even gone beta yet is justification for
throwing some weekend-warrior filtering demo into the actual
Apache Server codebase?
If that's what's really going on here I would say that it would
be an historic event if such a trivial need is what got a pile
of demo code permanently added to the Apache core distribution.
It's not my call... but that's my opinion.
> There hasn't been a release of zlib since 1998. I'm not concerned
> about this (because if there were memory leaks, *someone* would have
> addressed them by now).
They have. Search the Web. Try Google.
People who have simply thought they could just add ZLIB to their
multi-threaded Servers have already posted the results 'out there'.
It doesn't work. The only real solution is to serialize the
requests and surround everything with CRITSECS. Not good.
We have been working with some of the major Internet testing
companies and this has been their discovery as well. In their
efforts to support IETF Content-decoding on the client side they
all made the same assumption that ZLIB would work for even the
heaviest benchmarking loads and they have all made the same
discovery... it just doesn't cut the mustard.
> I believe that this does implement a core component of the HTTP spec
> - namely Transfer-Encoding. Most current browsers have gzip support
> built-in (Netscape, Opera, and Mozilla do - I bet IE does, but I
> don't use it) - however, we have never supported this. I believe it
> is time that we do so. Not having implemented this is a major
> oversight on our part.
While I do agree with you that http_protocol.c is probably the place
where IETF Content-Encoding has always needed to be supported...
there are other '3rd party modules' that actually add full HTTP support
to Apache. It's never been true that the 'core Apache' supplies
ALL of the answers to RFC2016. Whether or not this particuar part
of the SPEC should or should not be supported by core Apache is
the real debate.
> > Putting every module into the core is NOT the answer to this problem.
> > IMNSHO, Apache should be a minamilistic web server. If we don't need
> > it in the core, it shouldn't be there. Personally, I would remove
> > mod_dav from the server too, because it doesn't implement part of
> > RFC2616.
>
> I believe that DAV belongs in our standard module set because it is
> treated as an extension to HTTP. The same goes for our inclusion
> with SSL.
Whether or not mod_dav ( not part of RFC2016 ) should or should not
be in an 'out of the box' Apache distribution is a long standing
debate. It was almost tossed from 2.0 but the debate fizzled out
and so it remained.
> I believe that anything that adds value to the server out-of-the-box
> belongs in the main repository. Things like mod_pop3 and mod_mbox
> are special modules that no one really cares much about.
How many actual warm bodies really care about mod_dav?
AFAIK... not many.
> My +1 for adding this to the core distribution of modules stands.
> I fully believe that adding this functionality to the server is
> completely worth it.
Maybe so... but maybe Roy Fielding said it best a few messages
ago... maybe in light of the real progress finally being made at
debugging 2.0 itself and the fact that the light at the end of
tunnel no longer looks like an oncoming train is simply reason
enough to 'table' this kind of discussion for awhile.
As strong an advocate as I have been for the inclusion of IETF
Content-Encoding support in Apache going back 2 or 3 years now
even I don't want anything to distract the 'full-timers' ( Covalent
boys ) from getting this 2.0 thing 'out the door'.
Again... where is the house on fire? Is it only YOUR house
that's burning?
Eli Marmor wrote...
> If I recall correctly, this "guy who wrote gzip" (or - to be precise -
> one of the two guys who wrote it) is working with Remote Communications.
That's correct. We already have a relationship with Dr. Mark Adler.
It's a private contract and I am not at liberty to discuss the
details on this public forum.
> If it's true, it means that he feels OK with their implementation (maybe
> it's similar?).
That would also be a correct assumption.
> Having one less library to depend on, is an advantage and
> not a disadvantage, even if it requires mod_gzip to be 300K (I believe
> that the 2.0 version will be smaller, thanks to the I/O filtering).
I don't really understand this concern with 'maybe it's smaller'.
Since when has Apache really cared how many lines of code something has in it?
One thing none of you seem to realize is that we are under no illusions
that the moment Apache 2.0 comes out the whole world is going to throw
the very stable Apache 1.3.x series in the toilet. You guys have been
obsessed with trying to get 2.0 out the door and have stated that you
'no longer support 1.3.x' but we live in the real world and we most
certainly DO still support Apache 1.3.x. It will take years for the
general Internet community to 'roll over' to Apache 2.0. Some never
will because... frankly... they just don't need it.
Be prepared to discover what you seem to be forgetting and that is that
all 'good' modules for Apache should be able to support BOTH the 1.3.x
design AND the 2.0 design with the same codebase.
That is what you will see in mod_gzip for Apache 2.0. The exact same
module can compile/run for EITHER Apache 1.3.x OR Apache 2.0.
Yea... sure... that 'adds some code' and 'makes it a little bigger'
but what is better for users?... a module that can support ANY
version of Apache or one that only supports 1.3 or 2.0 but not both?
We believe people want the former, not the latter, and will expect
same for some time to come.
> Maybe we should simply ask him; His name is Mark Adler, more details at:
> http://www.alumni.caltech.edu/~madler/
Mark has that public website and, as such, is probably open to
receiving mail from anyone... but keep in mind that Mark is
currently the mission director for the Mars Rover Landing
mission at the Jet Propulsion Laboratory in Pasadena, CA, and
in addition to relationships with other vendors ( like us )
that makes him a pretty busy guy.
Guenter Knauf wrote...
> Hi,
> I was glad as Ian contributed his mod_gz; I tested it on Linux and Win32
> and it works for me.
What did you test?
How 'heavily loaded' was the Server?
Did you just ask for 1 thing, see if it came back compressed, and you
are calling that 'success'?
There's a LOT more to it than that.
> The problem I see with 3rd party modules is not mainly that they are
> 'invisible', I've found tons of modules and often 3 or more for the same
purpose,
> but many modules were only written for Unix and fail to compile on other
platforms
> because they often heavily use system libs which arent available on the
> new target; so many modules must be ported to the target os and the concept
of
> the Apache api is gone. And even if a module compiles without changes and
no
> porting is needed it's not guaranteed to run.
> The best sample is mod_gzip: I use it on Linux and Win32, but on
> NetWare the module compiles fine but doesnt work!
This was/is an Apache 1.3.x issue only and this issue
was resolved on the mod_gzip forum. mod_gzip forum users have been
VERY good at helping each other out. mod_gzip for Apache 1.3.x doesn't
work 'out of the box' for IBM's rewrite of Apache, either, and in both
cases it's because those vendors are re-writing Apache headers and making
changes that are not in the standard Apache distributions ( or even
available anywhere online ).
That being said... if I recall a number of the Netware problem
reports were simply from people that didn't realize you CAN use
mod_gzip to compress your SSL output but it takes a special
configuration. People were reporting output lengths of ZERO
in the compression statistics in ACCESS_LOG and didn't realize
that what happens is that SSL 'steals away' the connection handles
under Apache 1.3.x and delivers the responses outside of the
normal Apache delivery process. The pages were being delivered
fine but without the special configuration for mod_gzip they
were simply not being compressed.
This will be an ongoing issue for anything that tries to support
IETF Content-Encoding in an SSL enabled Server. You have to
do things 'in the right order'.
If someone rewrites Apache 2.0 headers and 're-distributes' Apache
the way IBM does you could easily start running into the same issues
with Apache 2.0 modules and/or filtering functions as well.
A module doesn't have to be 'part of the Apache core' to either
present or circumvent the problems you describe. It all depends
on what it's trying to do and whether the platform supports all
the calls being made.
There are 'patches' available for mod_gzip that solve the Netware
and IBM HTTPD issues. I believe Bill Stoddard himself is currently
supplying IBM customers with the 'right patch' for IBM HTTPD since
a number of Apache/IBM HTTPD clients are using mod_gzip.. ( or maybe
it was Andy over there in Raleigh ). I cannot afford IBM's rewrite
of Apache ( IBM HTTPD ) but I have the 'patches' submitted by others
to make it all work.
> I think this will not happen with Ian's module because it uses only
> Apache apis,
The Apache API's themselves are often 'functionally limited' depending
on the platform. See the 'os/xxx/network_io' files for good examples.
> so once the server runs on a platform mod_gz will do
> too (ok, as far as zlib is ported to that platform, but that's true
> for nearly every platform).
Key phrase here: 'nearly every platform'. ( Meaning: Not ALL platforms ).
> I was also in contact with Kevin, but he couldnt help me with the
> issue on NetWare...
I don't personally have/use Netware ( or IBM's HTTPD ) but other users on
the mod_gzip forum worked this out for themselves, as any good forum group
will do.
> Guenter
William Rowe wrote...
> Probably always need to set some 'threshhold' of 8kb (minimally) that the
> webserver absolutely ignores, and some include or exclude list by mime type
> to describe the value of further compression. Even if the file is requested
> to be gzip'ped, and it's targetted for the cache, set a flag at the end that
> says "hey, I saved -2% on this file, don't let us do _that_ again! File foo
> shouldn't be cached", and then internally add foo to the excludes list for
> any gzip filter.
See the existing mod_gzip source code and/or documentation.
Configuration directives for these kinds of things have been
there since day one ( almost a year ago, now ).
In particular... see the 'item mapping' documentation and how the use
of Apache's own regexec is used to help make the exact decisions you
are describing.
Your assumptions are correct... there is a LOT more to it than just
checking for 'Accept-encoding:' and firing off a filter.
Peter Cranstone wrote...
> Here is something for this forum if you are really interested in
> mod_gzip. Check out this link: http://www.schroepl.net/mod_gzip/
> It's an amazing analysis of mod_gzip on HTTP traffic and includes all
> different browser types. Here is what is amazing, check out the "saved"
> column and the "average" savings for all the different stats... About 51%
Since mod_gzip has always added 'compression stats' to Apache's own
ACCESS_LOG file ( mod_gz does not ) many people who have been running
mod_gzip for some time now have written some amazing log analysis
programs that show the 'true story'. The link that Peter has
quoted is just one of them. See the mod_gzip forum or the 'utilities'
links on the mod_gzip home page for other static and/or dynamic
CGI Apache log anaysis scripts and reports.
* SUMMARY
Following some private emails from some of the above posters it's
obvious that everyone wants the same thing here.
Please don't let this degenerate into the same sort of pissing
contest about real-time compression that has happened every time before.
Forget the fact that Micro$oft has been adding an ISAPI IETF Content-Encoding
filter to IIS for some time and forget the fact that's it's a joke... the
key thing is that the lumbering giant has been touting it as a 'feature'
and a 'selling point' for some time now and they have already read the
writing on the wall some time back.
If you read these postings and compare them with some discussions we
had on this forum over 2 years ago it's perfectly obvious that the
tide has turned and what was considered a 'stupid' thing to do inside
a Web Server ( dynamic compression ) is now becoming a 'requirement'.
Long overdue, my friends... but a welcome sight.
Whatever part our own mod_gzip has had to do with this 'turning of
the tide' represents the essence of what open source is all about and if
anyone thinks we are not trying to live up to that promise ( Experts in their
field contributing technology to the public ) then there isn't anything else
we can do to convice you folks of that.
It would seem you have 2 choices here...
1. Go ahead and add mod_gz and it's GNU GPL ZLIB required support to
the permanent Apache code tree right now and then only begin to discover
what is 'missing' to make it 'real' and then slog through a development
process that has to 'reinvent the wheel'. ( Sort of like mod_proxy has
always been ).
2. Wait just a few more days ( weeks at max ) for us to release the
fully tested Apache 2.0 version of a mature product that is currently
used ( and has been extensively tested ) by thousands of people under
all kinds of configurations and traffic weight scenarios and uses
the exact same Apache http.conf configurations that thousands of
people are already using.
No offense to Ian and his 'demo' filter... but I think I know which
choice I would make if I were an Apache code comitter.
Just make sure your output is working ( Transfer-encoding chunked,
content-length, etc ) correctly under all circumstances and get to a
BETA and that's when we will release mod_gzip for Apache 2.0.
The addition of an IETF Content-Encoding filter is not going to add
anything to your output testing that you can't do right now... it
will only tend to obfuscate engine problems that still exist.
Put the cart before the horse... CERTIFY your filtering I/O as
'Ok' ( Go BETA, at least ) and THEN start nailing it with the
more complicated (experimental) I/O scenarios.
Performance will be an ongoing 'patch fest' but as far as it
actually working under all circumstances...
You are almost there.
Yours...
Kevin Kiley
PS: Your own ApacheBench still cannot accept IETF Content-Encoding
and VERIFY that it was received correctly. If you are thinking about
adding ANY IETF Content-Encoding support to Apache you might want
to reconsider accepting our own Enhanced ApacheBench which is now
over 2 years old and is perfectly capable of receiving and
analyzing IETF Content-Encoding.
As with mod_gzip... this 'Content-decoding' version of ApacheBench
was actually coded FIRST for the Apache 2.0 source code tree and
then was ported BACK to 1.3.x so it will work with either version.
Our Enhanced ApacheBench ( complete with Apache License ) is still
right here in the same place it was almost 2 years ago when we tried
to contribute it and were told to 'just put it on the Web somewhere
so people can find it'.
http://www.eHyperSpace.com/apache/ab