Thanks, Duke. Those are good ideas. Another (more quick-n-dirty) solution is simply to have analog exclude requests that contain angle brackets:

FILEEXCLUDE REGEXP:<.*>

This seems to do the trick for the immediate problem, but I do rather like your suggestion. On the other hand, I kinda hate to get into the business of catering to all the various ways that some engines seem to junk up my URIs. :)

Thanks again,

- Patrick


On Aug 31, 2004, at 3:54 PM, Duke Hillard wrote:
In Apache, server redirects can be done with
mod_access or mod_rewrite.  Both can utilize
regular expressions.  This should be good news.
Even if you can't spot a pattern in the placement
and types of tags, there is still a pattern present
in the characters that tags use.  Unless your URIs
have greater-than and/or less-than tags, you can
write a rule for mod_access or mod_rewrite.  I
encourage this approach because: (1) it brings
your visitors to valid pages, (2) it trains search
engines to purge their outdated/incorrect entries
and list valid pages instead, (3) it will work with
any version of Analog without modification.  That
third one could save you lots of time, especially
as newer versions of Analog are released.  But,
to my way of thinking, the first is most important,
but the second could chip away at the root of the
whole matter.

You could adapt the following to deal with greater
numbers of tags (example is for two tags), and you
change the syntax to fit mod_rewrite (example is in
mod_access), and you could extend it beyond just
alphanumeric characters and slants (as in example).
Both lines are needed and meant to be on one line.

RedirectMatch 301 ^<([\/\w]*)>([\/\s\w]*)<([\/\w]*)>([\/\s\w]*) http://host.domain.tld/$2$4
RedirectMatch 301 ^([\/\s\w]*)<([\/\w]*)>([\/\s\w]*)<([\/\w]*)>([\/\s\w]*) http://host.domain.tld/$1$3$5


-- Duke



Patrick Robinson wrote:

Oh, I see what you're suggesting. No, that won't work; that URI I gave was just an example. There are *lots* of different requests for many different URIs, and the way the tags get added is unpredictable. I need a rule-based approach. I was thinking you were suggesting somehow using mod_rewrite to remove the tags from the requested URI's -- which I'm sure it can probably do, if you're a mod_rewrite guru! :-)

Basically, these requests are all invalid, and I don't know why they're getting requested. As far as I'm concerned, they should ALL return 404. I don't know why Apache is responding with a 206; I guess the client is sending Range headers in these requests, although I can't imagine why.

Perhaps a better solution (better than exluding ALL 206 responses) would be to configure analog to exclude requests that appear to contain html tags! :-/

- Patrick


On Aug 31, 2004, at 10:44 AM, Duke Hillard wrote:

|Using Apache 2.x.xx, I added a line to the config file
(/usr/local/apache2/conf/httpd.conf is default location).

The line needs to appear below the DocumentRoot directive
and within the <Directory> directive which relates to the
DocumentRoot (||/usr/local/apache2/htdocs is default location).|
|
In your case, the line might look like this (all on one line)
RedirectMatch 301 ^/pubs/<b>food</b>s(.*) http://host.domain.tld/pubs/foods$1


In the case of Apache, the ||RedirectMatch 301 accomplishes
two tasks: (1) it brings the end user to the desired page
and (2) it indicates to search engines that index/reindex
the page that it is permanently moved (helps search engines
correct their hyperlinks).
||
If you're using another server, check its documentation to
see if it supports server redirection (many servers do).
Of course, the syntax may change and file names/locations
are likely to be different.

HTH,

-- Duke


| Patrick Robinson wrote:

What kind of redirect did you use to do that? Did you have a regular expression to look for and remove pairs of tags from the requested URL? What does it look like?

thanks,

- Patrick


On Aug 30, 2004, at 11:48 AM, Duke Hillard wrote:

I previously encountered a similar situation.  My solution
was to include a server redirect in my server's config file.
The redirect brought visitors to the correct page and the
resulting log entries were parsed by Analog in a manner
that satisfied me.

HTH,

-- Duke


Patrick Robinson wrote:

Is there a way to restrict what gets included in the Directory Report, by HTTP status code? That is, requests that result in a "206 Partial Content" get included, but I want to exclude them.

Rationale:

I'm regularly seeing requests for URLs that look like this:

   /pubs/<b>food</b>s/348-907/348-907.html

I don't know why, but some engine or other is putting <b></b> tags around portions of an otherwise valid URL. And my server often responds with a 206. These end up appearing in my Directory Report, and I'd rather they not.

Thanks!

--
Patrick Robinson
AHNR Info Technology, Virginia Tech
[EMAIL PROTECTED]

_______________________________________________ analog-help mailing list [EMAIL PROTECTED] http://lists.meer.net/mailman/listinfo/analog-help

Reply via email to