Hi, sorry but I thought the information might be enough. As it says in the
subject we are using 3.1.5.
Ok, now more details:
You could try it on
http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html.
It's a german site, but you should be able to find the link to the print
version on the lower left corner where it says "Druckversion" with the
printer symbol. It will lead you to
http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html?p
p=1
Here's the config, the included stuff should not be important, just some
variables set and most of the stuff for htsearch.
#include some server-dependend stuff
include: ../htsearch.inc-serverconf
# include some server-independend stuff
include: htdig.inc-global
# Changed for each database. Places the databases in separate directories
# for convenience and organization
database_dir: /${HOST}/data/www/${SERVERNAME}/htdig/db/test
# Each database has a separate list of starting URLs
# This makes it easier to index a variety of categories
start_url:
http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html
restrict_urls: http://www.yavivo.de/
url_part_aliases: http://www${SERVERNUMBER}11.yavivo.de/ replace#4
# Any database-specific config options should go here...
exclude_urls: /cgi-bin/ .exe .cgi /search/ .txt .css .jpg .gif \
addMessageForm /disclaimer \
0/Hilfe 1/Hilfe 2/Hilfe 3/Hilfe 4/Hilfe 5/Hilfe \
6/Hilfe 7/Hilfe 8/Hilfe 9/Hilfe changeMessageForm
As you can see from the following -vv excerpt, all other urls but the
..?pp=1 are rejected (I shortened it a bit by stripping off most of the url
rejected lines!).
New server: www.yavivo.de, 80
pick: www.yavivo.de, # servers = 1
0:0:0:http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.
html:
title: Reisemedizin
More than one <title> tag in document! (possible search engine spamming)
META Description: Ich mache im Sommer eine 2-monatige Reise nach Bolivien,
wo ich von La Paz ausgehend Ausflüge über mehrer Tage mit einem Jeep ins
Land machen möchte, z. B .: Cobija, Sucre, Potosi,...Welche Impfungen
brauche ich? Ist Gelbfieber nötig? Brauche ich 3 ...
A tag: pos = 2, position = ="http://www.yavivo.de/index.html">
url rejected: (level 1)http://www.yavivo.de/index.html
A tag: pos = 2, position = ="http://www.yavivo.de/index.html">
... many more url rejected lines ...
pushing
http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html?p
p=1
+A tag: pos = 2, position =
="http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html
?pp=1" target="_blank" title="Druckversion">
*A tag: pos = 2, position =
="http://www.yavivo.de/Home/35Autorenverzeichnis/AutorenJ_L/Kroeger.html">
... many more url rejected lines ...
pick: www.yavivo.de, # servers = 1
1:1:1:http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.
html?pp=1: A tag: pos = 2, position =
="http://www.yavivo.de/Expertenrat/Forum/Reisemedizin/985878142/message.html
">
* size = 7486
pick: www.yavivo.de, # servers = 1
htdig: Run complete
htdig: 1 server seen:
htdig: www.yavivo.de:80 2 documents
-----Ursprüngliche Nachricht-----
Von: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Gesendet: Mittwoch, 4. April 2001 00:50
An: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Betreff: Re: [htdig] nofollow bug in 3.1.5?
According to Reich, Stefan:
> in my (test) config I have start_url www.test.com/document.html
> This document contains <meta name="robots" content="index,nofollow">.
> Beside some other links, it also contains a link to
> http://www.test.com/document.html?pp=1 (we are using zope and generate a
> print version of a document if the parameter is provided).
>
> Althogh htdig is not following any other link on the page, it follows
> http://www.test.com/document.html?pp=1.
>
> Is this a bug or is there something else wrong with my config?
Very hard to say with the little bit of information you've provided.
There were problems with the handling of meta robots tags in older
versions, before 3.1.3, and versions before 3.1.0b1 didn't handle these
tags at all. If you have a more recent version, try running htdig -vvv
to see what's happening before and after it hits the tags and links
in question. It's pretty hard to guess at the problem when given a URL
we can't use.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html