Andrew M. Bishop
Sat, 04 Mar 2006 03:09:43 -0800
Miernik <[EMAIL PROTECTED]> writes: > Andrew M. Bishop <[EMAIL PROTECTED]> wrote: > > Miernik <[EMAIL PROTECTED]> writes: > > > >> I tried in the Purge section: > >> > >> <http://*/*POST*> age = 0 > >> > >> It doesn't work (POSTs are not matched by it). Why? > > > > It doesn't work because the 'POST' is not really part of the URL, it > > is just something that WWWOFFLE adds when it displays it to the user. > > I found out that it didn't work, because the sites I tried had also CGI > parameters in the POST URL, but when I did: > > <http://*/*!POST*> age = 0 > <http://*/*?*!POST*> age = 0 > > it works both for URLs with CGI parameters, and without. OK, looking at the code I think that you are correct, the POST part of the URL is present when WWWOFFLE processes the URL. It is removed when requesting from the server, but otherwise WWWOFFLE treats it like it was real. > It appears that WWWOFFLE treats CGI parameters specially, and http://*/* > will not match http://foo.com/bar?baz , you have to use http://*/*?* > Wait, but when I use http://foo.com/* in my Purge section, everything is > matched OK, also if the URL has CGI parameters. So I don't know. Have you looked at the README.CONF file that comes with WWWOFFLE or the URL http://localhost:8080/configuration/#URL-SPECIFICATION -------------------- README.CONF -------------------- In general this is written as (proto)://(host)[:(port)]/[(path)][?(args)] Where [] indicates an optional feature, and () indicate a user supplied name or number. Some example URL-SPECIFICATION options are the following: *://*/* Any protocol, Any host, Any port, Any path, Any args (This is that same as saying 'default'). *://*/(path) Any protocol, Any host, Any port, Named path, Any args *://*/*? Any protocol, Any host, Any port, Any path, No args *://*/(path)?* Any protocol, Any host, Any port, Named path, Any args *://(host) Any protocol, Named host, Any port, Any path, Any args (proto)://*/* Named proto, Any host, Any port, Any path, Any args (proto)://(host)/* Named proto, Named host, Any port, Any path, Any args (proto)://(host):/* Named proto, Named host, Default port, Any path, Any args *://(host):(port)/* Any protocol, Named host, Named port, Any path, Any args -------------------- README.CONF -------------------- If you use a wildcard on the path part of the URL it applies to anything on that path, even CGIs that have their own arguments. > Besides that binding ? as the character defining where CGI > parameters begin and & as the separator, is not 100% sure It is 100% sure because of the way that the '?' character is defined. If you read the specifications for URLs (RFC 1738, RFC 1808 and RFC 2396 are the important ones) you will see that the '?' character is special. >, i've seen > sites which use / character for both. for example bahn.de uses mixed: > http://reiseauskunft.bahn.de/bin/zuginfo.exe/en/67557/289483/593024/273993/80/ld=212.56&seqnr=5&ident=m9.02412414.1141116804¤tReferrer=tp& > Here "?" character is not used anywhere in the URL, but it clearly has > CGI parameters. They look like CGI parameters, and they may be treated like then in the server, but to the browser they are not CGI parameters. They will never be put there by filling in a form for example. If the URL http://www.foo/fakecgi/a=1&b=2&c=3 contains a link to the page that is referenced as <a href="bar.html"> then the page that the browser will fetch will be http://www.foo/fakecgi/bar.html. If the original URL had been http://www.foo/realcgi?a=1&b=2&c=3 then the same link would direct the browser to http://www.foo/bar.html. > Some time ago era.pl used URLs like this: > http://era.pl/index.php/id=p_2xtaktak/section=taktak/zone=-1 > now they changed them to standard: > http://era.pl/index.php?id=p_2xtaktak§ion=taktak&zone=-1 > > Here also I would like to wash the zone=-1 as it is useless, and the > number changes, making the same URL cached multiple times, and not > beeing able to access the cached URL from links, because the zone=nn > number is different. It is also probably only used by them to track from > which page a user came to this page. To be able to perform the "washing" of URLs where you want it to work on these fake CGIs would be more difficult because you need to take the path apart to its componenents and look for those parts that have '&' characters and then take them apart. -- Andrew. ---------------------------------------------------------------------- Andrew M. Bishop [EMAIL PROTECTED] http://www.gedanken.demon.co.uk/ WWWOFFLE users page: http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html