Hello!

I want to index html pages spread over a general purpose directory
tree, and served by an Apache server. I get into trouble with
occasional soft links to directories resulting into cycles. In such a
case, the indexing doesn't complete.

I am trying to work around the problem by using HeaderName /
SuppressHTMLPreamble and a robots meta tag with a "none" contents. As
it happens, I fail to get it to work reliably.

Now, I was trying to set up a minimal test case, and I cannot get the
error anymore!

I have a test htdig.conf with "start_url: http://localhost/tst";, a
soft link in my Apache DocumentRoot: "tst -> /home/mgirod/tmp/hd/tst",
and in the tst directory:

  lrwxrwxrwx    1 mgirod   Domain U        1 Feb 26 16:15 dot -> .
  -rw-r--r--    1 mgirod   Domain U      209 Feb 28 11:11 index.html

In the current version (trying to get the problem), my index.html does
*not* contain the line:

  <meta NAME="robots" CONTENT="none">

$ cat index.html
<!--global preamble automatically inserted-->
<html>
 <head>
  <title>Index</title>
 </head>
 <body>
<h1>Index</h1>

<ul>
  <li><a href="dot">dot</a>
</ul>
</body></html>
$ 

However, contrarily to my expectation, the indexing goes fine
(starting with an empty database):

$ htdig -i -a -c ~/tmp/hd/htdig.conf -s -v
ht://dig Start Time: Fri Feb 28 11:29:19 2003

New server: localhost, 80
 - Persistent connections: enabled
 - HEAD before GET: disabled
 - Timeout: 30
 - Connection space: 0
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
 - Accept-Language: 
0:2:0:http://localhost/tst:  redirect
htdig: Run complete
htdig: 1 server seen:
htdig:     localhost:80 1 document

HTTP statistics
===============
 Persistent connections    : Yes
 HEAD call before GET      : No
 Connections opened        : 2
 Connections closed        : 2
 Changes of server         : 0
 HTTP Requests             : 2
 HTTP KBytes requested     : 0.428711
 HTTP Average request time : 0 secs
 HTTP Average speed        : inf KBytes/secs

ht://dig End Time: Fri Feb 28 11:29:19 2003
$ 

Er... can anybody tell me why?

Annex question: what drives the production of the db.urls file? On one
host, it gets produced, and on a second one with a similar
configuration, not.

-- 
Marc Girod        P.O. Box 323        Voice:  +358-71 80 25581
Nokia NBI         00045 NOKIA Group   Mobile: +358-50 38 78415
Takomo 1 / 4c27   Finland             Fax:    +358-71 80 61604



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to