Paul A. Rombouts
Tue, 16 May 2006 22:13:40 -0700
One of the main reasons I keep using WWWOFFLE is that it is very useful for avoiding ads. Unfortunately, the features offered by the DontGet and ModifyHTML sections are insufficient to get rid of ads that are written into the webpage itself, and not referenced by external links that can be blocked by a DontGet list. An example of this that I find particularly annoying, are the "Linux Reference Center" ads sponsored by Microsoft in linuxtoday.com. When I happened to learn about a Firefox extension called Greasemonkey, that makes it possible to run user-written Javascript code to change webpage content, it occurred to me that it could also be used to remove almost any possible advertisement. This turned out to work quite well. One of the shortcomings is that ads are removed after a page has loaded, so that the ads can often still be seen temporarily. The biggest shortcoming, though, was that I was forced to use Firefox to enjoy this feature. And to my chagrin, when I upgraded to Firefox 1.5, most of the Greasemonkey scripts that I had written were broken for some reason. One day, when I was looking at the source code in src/htmlmodify.l that implemented the add-cache-info feature, I had a really neat idea. I could add a new feature to WWWOFFLE that enables me to insert HTML code from a local file into the page being modified. By putting my own Javascript inside this HTML code, I could get most of the possibilities of Greasemonkey in any browser that supported Javascript. I have called the new option "insert-file", and it is included in the patch that I publish at my WWWOFFLE webpage http://www.phys.uu.nl/~rombouts/wwwoffle.html . I also have an example of an include file that I use to get rid of the ads in linuxtoday.com that I mentioned above: http://www.phys.uu.nl/~rombouts/wwwoffle/linuxtoday_adblocker.html.txt The latter example script uses the XPath (http://www.w3.org/TR/xpath) evaluator that is available in mozilla based browsers (at least in the fairly recent versions). I find that XPath is a very useful language for expressing which parts of an HTML tree I want to get rid of. That is why I am seriously thinking about implementing a feature in WWWOFFLE that allows you to censor HTML on the basis of XPath expressions. The feature would allow you to add something like this to the ModifyHTML section:<http://*example.com> censor-html = //[EMAIL PROTECTED]'ad_content' or @class='adbox']
This would, for example, get rid of everything starting with a <div id="xyz" class="ad_content"> tag up to and including the corresponding </div> endtag, even before the content reaches the browser. The "censor-html" feature is still only in the planning stage. Nevertheless, I would be interested to know what people on this mailing list think of this possibility. -- Paul A. Rombouts