Hi!

This is addition to Marc's post. Our main issue is that htdig starts cycle when it 
tries to index directory where
is soft link itself and it seems never stop. Here is few log and some infos our 
environments.

> cat /etc/redhat-release
Red Hat Linux release 8.0 (Psyche)

> rpm -qi httpd
Name        : httpd                        Relocations: (not relocateable)
Version     : 2.0.40                            Vendor: Red Hat, Inc.
Release     : 11                            Build Date: Wed 09 Oct 2002 03:04:34 PM 
EEST
Install date: Mon 24 Feb 2003 02:12:16 PM EET      Build Host: daffy.perf.redhat.com
Group       : System Environment/Daemons    Source RPM: httpd-2.0.40-11.src.rpm
Size        : 2702107                          License: Apache Software License

> rpm -qi htdig
Name        : htdig                        Relocations: /usr
Version     : 3.2.0                             Vendor: Red Hat, Inc.
Release     : 7.20020505                    Build Date: Sun 11 Aug 2002 06:15:46 AM 
EEST
Install date: Mon 24 Feb 2003 04:06:19 PM EET      Build Host: porky.devel.redhat.com
Group       : Applications/Internet         Source RPM: htdig-3.2.0-7.20020505.src.rpm
Size        : 31698043                         License: GPL

And configuration parts what we have changed to try fix this problem.

-httpd.conf-
...
<Directory "/var/www/html">
#
    Options Indexes FollowSymLinks MultiViews
...
Alias /htdig_sltest/ "/home/laitijar/test/"

<Directory "/home/laitijar/test">
    Options FollowSymLinks Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>
...
IndexOptions SuppressHTMLPreamble FancyIndexing VersionSort NameWidth=*
...
HeaderName /autoindex-header.html
...

-htdig.conf-
database_dir:           /var/lib/htdig
start_url:              http://localhost/htdig_sltest/
limit_urls_to:          ${start_url}
exclude_urls:           /cgi-bin/ .cgi /usage/
bad_extensions:         .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg 
.aiff .class .map .ram .tgz
.bin .rpm .mpg .mov .avi .css
maintainer:             [EMAIL PROTECTED]
max_head_length:        10000
max_doc_size:           200000
no_excerpt_show_top:    true
search_algorithm:       exact:1 synonyms:0.5 endings:0.1
next_page_text:         <img src="/htdig/buttonr.gif" border="0" align="middle" 
width="30" height="30"
alt="next">
no_next_page_text:
prev_page_text:         <img src="/htdig/buttonl.gif" border="0" align="middle" 
width="30" height="30"
alt="prev">
no_prev_page_text:
page_number_text:       '<img src="/htdig/button1.gif" border="0" align="middle"
...

And result of htdig.
> sudo htdig -v -i -s -a

ht://dig Start Time: Tue Mar  4 13:36:07 2003

New server: localhost, 80
 - Persistent connections: enabled
 - HEAD before GET: disabled
 - Timeout: 30
 - Connection space: 0
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
 - Accept-Language:
0:2:0:http://localhost/htdig_sltest/: ++++-++ size = 761
1:3:1:http://localhost/htdig_sltest/?C=N&O=D: +***-** size = 761
2:8:1:http://localhost/htdig_sltest/test_gcs_tnx.html:  size = 1040
3:4:1:http://localhost/htdig_sltest/?C=M&O=A: *+**-** size = 761
4:7:1:http://localhost/htdig_sltest/sl/: ++++*++ size = 774
5:6:1:http://localhost/htdig_sltest/?C=D&O=A: ***+-** size = 761
6:5:1:http://localhost/htdig_sltest/?C=S&O=A: **+*-** size = 761
7:17:2:http://localhost/htdig_sltest/?C=D&O=D: ****-** size = 761
8:18:2:http://localhost/htdig_sltest/?C=S&O=D: ****-** size = 761
9:15:2:http://localhost/htdig_sltest/sl/sl/: ++++*++ size = 777
10:14:2:http://localhost/htdig_sltest/sl/?C=D&O=A: +**+*** size = 774
11:10:2:http://localhost/htdig_sltest/?C=M&O=D: ****-** size = 761
12:9:2:http://localhost/htdig_sltest/?C=N&O=A: ****-** size = 761
13:11:2:http://localhost/htdig_sltest/sl/?C=N&O=D: ******* size = 774
14:16:2:http://localhost/htdig_sltest/sl/test_gcs_tnx.html:  size = 1040
15:12:2:http://localhost/htdig_sltest/sl/?C=M&O=A: *+***** size = 774
16:13:2:http://localhost/htdig_sltest/sl/?C=S&O=A: **+**** size = 774
17:27:3:http://localhost/htdig_sltest/sl/?C=M&O=D: ******* size = 774
18:28:3:http://localhost/htdig_sltest/sl/?C=S&O=D: ******* size = 774
19:20:3:http://localhost/htdig_sltest/sl/sl/?C=M&O=A: ++***** size = 777
20:19:3:http://localhost/htdig_sltest/sl/sl/?C=N&O=D: ******* size = 777
21:25:3:http://localhost/htdig_sltest/sl/?C=N&O=A: ******* size = 774
22:24:3:http://localhost/htdig_sltest/sl/sl/test_gcs_tnx.html:  size = 1040
23:22:3:http://localhost/htdig_sltest/sl/sl/?C=D&O=A: ***+*** size = 777
24:23:3:http://localhost/htdig_sltest/sl/sl/sl/: ++++*++ size = 780
25:26:3:http://localhost/htdig_sltest/sl/?C=D&O=D: ******* size = 774
26:21:3:http://localhost/htdig_sltest/sl/sl/?C=S&O=A: **+**** size = 777
27:36:4:http://localhost/htdig_sltest/sl/sl/sl/sl/: ++++*++ size = 783
28:38:4:http://localhost/htdig_sltest/sl/sl/?C=S&O=D: ******* size = 777
29:31:4:http://localhost/htdig_sltest/sl/sl/?C=D&O=D: ******* size = 777
30:30:4:http://localhost/htdig_sltest/sl/sl/?C=M&O=D: ******* size = 777
31:34:4:http://localhost/htdig_sltest/sl/sl/sl/?C=S&O=A: +*+**** size = 780
32:35:4:http://localhost/htdig_sltest/sl/sl/sl/?C=D&O=A: ***+*** size = 780
33:29:4:http://localhost/htdig_sltest/sl/sl/?C=N&O=A: ******* size = 777
34:37:4:http://localhost/htdig_sltest/sl/sl/sl/test_gcs_tnx.html:  size = 1040
35:32:4:http://localhost/htdig_sltest/sl/sl/sl/?C=N&O=D: ******* size = 780
36:33:4:http://localhost/htdig_sltest/sl/sl/sl/?C=M&O=A: *+***** size = 780
37:39:5:http://localhost/htdig_sltest/sl/sl/sl/sl/?C=N&O=D: +****** size = 783
38:48:5:http://localhost/htdig_sltest/sl/sl/sl/?C=M&O=D: ******* size = 780
39:46:5:http://localhost/htdig_sltest/sl/sl/sl/?C=S&O=D: ******* size = 780
40:43:5:http://localhost/htdig_sltest/sl/sl/sl/sl/sl/: ++++*++ size = 798
41:42:5:http://localhost/htdig_sltest/sl/sl/sl/sl/?C=D&O=A: ***+*** size = 783
42:44:5:http://localhost/htdig_sltest/sl/sl/sl/sl/test_gcs_tnx.html:  size = 1040
43:41:5:http://localhost/htdig_sltest/sl/sl/sl/sl/?C=S&O=A: **+**** size = 783
44:40:5:http://localhost/htdig_sltest/sl/sl/sl/sl/?C=M&O=A: *+***** size = 783
45:47:5:http://localhost/htdig_sltest/sl/sl/sl/?C=D&O=D: ******* size = 780
46:45:5:http://localhost/htdig_sltest/sl/sl/sl/?C=N&O=A: ******* size = 780
47:53:6:http://localhost/htdig_sltest/sl/sl/sl/sl/sl/?C=D&O=A: +**+*** size = 798
48:52:6:http://localhost/htdig_sltest/sl/sl/sl/sl/sl/?C=S&O=A: **+**** size = 798
49:56:6:http://localhost/htdig_sltest/sl/sl/sl/sl/?C=D&O=D: .******* size = 783
htdig: Run complete
htdig: 1 server seen:
htdig:     localhost:80 60 documents

HTTP statistics
===============
 Persistent connections    : Yes
 HEAD call before GET      : No
 Connections opened        : 52
 Connections closed        : 52
 Changes of server         : 0
 HTTP Requests             : 52
 HTTP KBytes requested     : 47.1641
 HTTP Average request time : 0.0576923 secs
 HTTP Average speed        : 15.7214 KBytes/secs

ht://dig End Time: Tue Mar  4 13:36:10 2003


Thanks in advance,
Jarno


Marc Girod wrote:

> Hello!
>
> I want to index html pages spread over a general purpose directory
> tree, and served by an Apache server. I get into trouble with
> occasional soft links to directories resulting into cycles. In such a
> case, the indexing doesn't complete.
>
> I am trying to work around the problem by using HeaderName /
> SuppressHTMLPreamble and a robots meta tag with a "none" contents. As
> it happens, I fail to get it to work reliably.
>
> Now, I was trying to set up a minimal test case, and I cannot get the
> error anymore!
>
> I have a test htdig.conf with "start_url: http://localhost/tst";, a
> soft link in my Apache DocumentRoot: "tst -> /home/mgirod/tmp/hd/tst",
> and in the tst directory:
>
>   lrwxrwxrwx    1 mgirod   Domain U        1 Feb 26 16:15 dot -> .
>   -rw-r--r--    1 mgirod   Domain U      209 Feb 28 11:11 index.html
>
> In the current version (trying to get the problem), my index.html does
> *not* contain the line:
>
>   <meta NAME="robots" CONTENT="none">
>
> $ cat index.html
> <!--global preamble automatically inserted-->
> <html>
>  <head>
>   <title>Index</title>
>  </head>
>  <body>
> <h1>Index</h1>
>
> <ul>
>   <li><a href="dot">dot</a>
> </ul>
> </body></html>
> $
>
> However, contrarily to my expectation, the indexing goes fine
> (starting with an empty database):
>
> $ htdig -i -a -c ~/tmp/hd/htdig.conf -s -v
> ht://dig Start Time: Fri Feb 28 11:29:19 2003
>
> New server: localhost, 80
>  - Persistent connections: enabled
>  - HEAD before GET: disabled
>  - Timeout: 30
>  - Connection space: 0
>  - Max Documents: -1
>  - TCP retries: 1
>  - TCP wait time: 5
>  - Accept-Language:
> 0:2:0:http://localhost/tst:  redirect
> htdig: Run complete
> htdig: 1 server seen:
> htdig:     localhost:80 1 document
>
> HTTP statistics
> ===============
>  Persistent connections    : Yes
>  HEAD call before GET      : No
>  Connections opened        : 2
>  Connections closed        : 2
>  Changes of server         : 0
>  HTTP Requests             : 2
>  HTTP KBytes requested     : 0.428711
>  HTTP Average request time : 0 secs
>  HTTP Average speed        : inf KBytes/secs
>
> ht://dig End Time: Fri Feb 28 11:29:19 2003
> $
>
> Er... can anybody tell me why?
>
> Annex question: what drives the production of the db.urls file? On one
> host, it gets produced, and on a second one with a similar
> configuration, not.
>
> --
> Marc Girod        P.O. Box 323        Voice:  +358-71 80 25581
> Nokia NBI         00045 NOKIA Group   Mobile: +358-50 38 78415
> Takomo 1 / 4c27   Finland             Fax:    +358-71 80 61604
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html




-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger 
for complex code. Debugging C/C++ programs can leave you feeling lost and 
disoriented. TotalView can help you find your way. Available on major UNIX 
and Linux platforms. Try it free. www.etnus.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to