On Sun, 2002-10-13 at 04:47, fabio rohrich wrote:
> HI!
> I wrote you last time about my development of a new
> apache module.
>
> mod_blanks: a module for the Apache web server which would on-the-fly
> remove unnecessary blank space, comments and other non-interesting
> things from the served page. Skills needed: the C langugae, a bit of
> text parsing techniques, HTML, learn Apache API. Complexity: low to
> moderate (after learning the API). Usefulness: moderate to low (but
> maybe better than that, it's a kind of nice toy topic that could be
> shown to save a lot of bandwith on the Internet :-).
>
> So, the question is. I'm developing it for my bachelor thesis
> and my teacher told me it's too easy to develop it.
> So, have you some ideas, like something to do more (something
> like compression) or other things to add in the module.
If you want to stick with the mod_blanks idea but make it
more more advanced (so that it's complicated enough to be
a thesis project), here are a couple of ideas:
* Removing extra spaces/comments/etc from HTML while delivering
it is a good idea, but it's not necessarily something that
you want your web server to do on every request. If you
deliver the same page a hundred times per day (or a hundred
times per second), it's wasteful to keep doing the same
parsing work on the same file over and over. So one
possibility is: make the module smart enough to cache
the "optimized" versions of pages.
* Another challenge with mod_blanks is that there is a
tradeoff between bandwidth cost and hardware cost. If you
do a lot of processing to reduce the bytes sent (removing
extraneous spaces, compression, etc), it will reduce your
bandwidth cost, but you'll have to spend more on server
hardware. And if your server suddenly gets a lot of
traffic, it might be able to handle the extra load, but
not if it also has to do all the mod_blanks processing
(the same idea applies to mod_deflate also). So one idea
that might be interesting is: Let the server administrator
define which optional filters can be skipped when the server
is heavily loaded. (An "optional" module in this situation
would mean something that we could skip without causing a
bad response to be sent to the client. So mod_deflate counts
as optional, for example, but mod_include doesn't.) Then,
during request processing, decide whether to run the
optional filters based on how overloaded the server is.
* One more idea: do some research to determine which is
faster: removing blanks and comments, or just compressing
the HTML. Or, to put it another way, build mod_blanks and
compare its performance to mod_deflate. Mod_blanks would
have an advantage, because it can use simpler and faster
code. On the other hand, mod_deflate also has an advantage
because it will result in a smaller block of bytes being
written to the socket, which usually will reduce the CPU
time spent in the kernel. Which one will win? Or is it
better to do both: eliminate spaces and comments, and also
compress?
Brian