I munged the directory names intentionally = unnecessarily :) Basically it's a weird issues (our problem, but more likely naughty search spiders).
Our site has been structured like this: http://www.pubcrawler.com/Template/index.cfm /Template is our root directory For some reason yet to be determined, getting high amount of requests for pages like: http://www.pubcrawler.com/template/index.cfm (determined these invalid requests during an infrequent clean / bug tracking of our application server --- admin GUI kept showing a reoccurring live time 404 request for /template requests). We have scripts like: http://www.pubcrawler.com/Template/ReviewWC.cfm/flat/BREWERID=107345 Which dummy spider(s) are munging: http://www.pubcrawler.com/Template/reviewwc.cfm/flat/BREWERID=107345 and sometimes: http://www.pubcrawler.com/template/reviewwc.cfm/flat/BREWERID=107345 Most requests seem to be lowercased URLs. I haven't checked on the originator of the requests yet, nor found the source of the invalid URLs. Fortunately, we only have maybe a dozen or two scripts with varied case names and only one directory. So it's quite finite to work around this in interim :) Big problem really is the sheer number of these requests which have been 404'd since whoever began requesting these malformed requests :) We have 10's of millions of page of content, so could be significant. Interesting mystery --- and something that may be more interesting when I track down the requester(s). Jędrzej' regex worked 100% for me. So part of the issue solved. Determined Varnish (cache server) up front was caching these as 404's (shouldn't be) which made perfecting and testing regex manually an impossible failure. Waiting patiently for Cherokee caching functionality for balancer content :) Love Varnish's speed, but find it a pain to config and regularly have to make the config file more complicated. On Sun, Feb 20, 2011 at 11:47 AM, Alvaro Lopez Ortega <[email protected]> wrote: > Hello there, > > On 20/02/2011, at 16:54, pub crawler wrote: > >> We have a high traffic problem. >> >> Have a directory: >> >> http://www.website.com/Directory/whatever.php >> >> Search spiders are going nuts requesting this (1000's of these wrong >> requests a day) >> http://www.website.com/directory/whatever.php >> >> How do I simply handle this transparently to internally redirect to >> the proper /Directory subdirectory instead of the wrong /directory >> subdirectory? >> >> (I am at a loss on the regex stuff - anyone with a useful >> tool/builder/reference please recommend). > > I don't think I'm understanding what the problem is. Neither of the previous > URLs is working, actually. > > Could you please clarify what the problem is? And, besides, how many of those > directories do you have? > > -- > Octality > http://www.octality.com/ > > _______________________________________________ Cherokee mailing list [email protected] http://lists.octality.com/listinfo/cherokee
