>> By looking through Apache's access log I noticed the following 2 lines : 
>> 
>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET /robots.txt HTTP/1.0" 404 278

>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET 
>> /r2_admin/robot_init_page/?ht_dig_robot=1 HTTP/1.0" 401 
>> 471 
>> 
>> I did not do a "robots.txt" file as my server is the only one to index 
>> the site. 

>That's fine, but htdig will still fetch it. It's required to do so by 'net 
>standards. It does this first off when it finds a server. I assume the 
>next line is your start_url? 

Okay, that is what I understood from your "A standard for Robot exclusion" page
but I thought that another server was trying to access my site.

And yes, the next line is my start_URL.

>> It looks as if there is some kind of automatic indexing (of course 4:02 is 
>> nowhere to be found in my crontab) 

>Well it has to be launched somehow, either from 'cron' or 'at' since htdig 
>cannot launch by itself. What time is in your crontab? 

Here is an extract from my crontab (for the root-user) : 

35 9-18 * * 1-5 /root/bin/rundig.sh


>> that after it my db.wordlist file is 
>> empty... 

>And if you run the script yourself from the command-line it works fine? 
>What cron program/version do you use? 

The script works fine, even when with cron 
(my version is (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie
Exp $)).

The robot stops on the first page : this must be due to authentication and as it
does not index any pages, my db.wordlist file is erased (I run htdig with the -i
option).

In that case, why does it find the username and password that are in the
rundig.sh script ???

By the way, I am sorry if I ask any stupid questions, I am not a very
experienced Linux user !!!

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to