In Apache, server redirects can be done with mod_access or mod_rewrite. Both can utilize regular expressions. This should be good news. Even if you can't spot a pattern in the placement and types of tags, there is still a pattern present in the characters that tags use. Unless your URIs have greater-than and/or less-than tags, you can write a rule for mod_access or mod_rewrite. I encourage this approach because: (1) it brings your visitors to valid pages, (2) it trains search engines to purge their outdated/incorrect entries and list valid pages instead, (3) it will work with any version of Analog without modification. That third one could save you lots of time, especially as newer versions of Analog are released. But, to my way of thinking, the first is most important, but the second could chip away at the root of the whole matter.
You could adapt the following to deal with greater numbers of tags (example is for two tags), and you change the syntax to fit mod_rewrite (example is in mod_access), and you could extend it beyond just alphanumeric characters and slants (as in example). Both lines are needed and meant to be on one line.
RedirectMatch 301 ^<([\/\w]*)>([\/\s\w]*)<([\/\w]*)>([\/\s\w]*) http://host.domain.tld/$2$4
RedirectMatch 301 ^([\/\s\w]*)<([\/\w]*)>([\/\s\w]*)<([\/\w]*)>([\/\s\w]*) http://host.domain.tld/$1$3$5
-- Duke
Patrick Robinson wrote:
Oh, I see what you're suggesting. No, that won't work; that URI I gave was just an example. There are *lots* of different requests for many different URIs, and the way the tags get added is unpredictable. I need a rule-based approach. I was thinking you were suggesting somehow using mod_rewrite to remove the tags from the requested URI's -- which I'm sure it can probably do, if you're a mod_rewrite guru! :-)
Basically, these requests are all invalid, and I don't know why they're getting requested. As far as I'm concerned, they should ALL return 404. I don't know why Apache is responding with a 206; I guess the client is sending Range headers in these requests, although I can't imagine why.
Perhaps a better solution (better than exluding ALL 206 responses) would be to configure analog to exclude requests that appear to contain html tags! :-/
- Patrick
On Aug 31, 2004, at 10:44 AM, Duke Hillard wrote:
|Using Apache 2.x.xx, I added a line to the config file (/usr/local/apache2/conf/httpd.conf is default location).
The line needs to appear below the DocumentRoot directive
and within the <Directory> directive which relates to the
DocumentRoot (||/usr/local/apache2/htdocs is default location).|
|
In your case, the line might look like this (all on one line)
RedirectMatch 301 ^/pubs/<b>food</b>s(.*) http://host.domain.tld/pubs/foods$1
In the case of Apache, the ||RedirectMatch 301 accomplishes two tasks: (1) it brings the end user to the desired page and (2) it indicates to search engines that index/reindex the page that it is permanently moved (helps search engines correct their hyperlinks). || If you're using another server, check its documentation to see if it supports server redirection (many servers do). Of course, the syntax may change and file names/locations are likely to be different.
HTH,
-- Duke
| Patrick Robinson wrote:
What kind of redirect did you use to do that? Did you have a regular expression to look for and remove pairs of tags from the requested URL? What does it look like?
thanks,
- Patrick
On Aug 30, 2004, at 11:48 AM, Duke Hillard wrote:
I previously encountered a similar situation. My solution was to include a server redirect in my server's config file. The redirect brought visitors to the correct page and the resulting log entries were parsed by Analog in a manner that satisfied me.
HTH,
-- Duke
Patrick Robinson wrote:
Is there a way to restrict what gets included in the Directory Report, by HTTP status code? That is, requests that result in a "206 Partial Content" get included, but I want to exclude them.
Rationale:
I'm regularly seeing requests for URLs that look like this:
/pubs/<b>food</b>s/348-907/348-907.html
I don't know why, but some engine or other is putting <b></b> tags around portions of an otherwise valid URL. And my server often responds with a 206. These end up appearing in my Directory Report, and I'd rather they not.
Thanks!
-- Patrick Robinson AHNR Info Technology, Virginia Tech [EMAIL PROTECTED]
begin:vcard fn:Duke Hillard n:Hillard;Duke org:University of Louisiana at Lafayette;University Computing Support Services adr:;;P.O. Box 42770;Lafayette;LA;70504-2770;USA email;internet:[EMAIL PROTECTED] title:University Webmaster tel;work:337.482.5763 url:http://www.louisiana.edu/ version:2.1 end:vcard
_______________________________________________ analog-help mailing list [EMAIL PROTECTED] http://lists.meer.net/mailman/listinfo/analog-help

