Hello

I got the following problem: my digging eats too much time!
I got a PC (SuSE-Linux, Athlon 500, 128 MB RAM, reiserfs), where I'm running
wwwoffle on it, in combination with htdig. My archive is very big, it counts
about 200 hosts (most of them are only advertising or otherwise small) with
130'000 Files and 305 megs. 
If I run htdig with serverwait_time=0 (standard value), htdig does it job in
about 1 and 1,5 hour. So far so god, but after this, when I'm searching in
my database, a lot of site are missing. In the log I saw that several servers
were not digged, in the log-file was something written like "New server:
basis.shacknet.nu  no server running". According to this I've set the
serverwait_time=1, because I read somewhere this could solve the problem.
But now I'm digging my archive since wednesday, and today is monday (so
since 5 days). I'm confused, because all the sites to dig are on the local
machine. How can I speed up? Do I need a faster PC? Is Reiser-FS slowing down? Why
this "no server running"-error? Any suggestions?

Thank you very every hint! 

regards
David

----------------

This is my wwwoffle-htdig-full.conf:

#
# Config file for ht://Dig and WWWOFFLE.
#
# This configuration file is used by htdig with wwwoffle (for a full
search).
#

#
# The location of the files.
#

database_dir: /spool/wwwoffle/html/htdig/db

#
# We need to use the wwwoffle proxy
#

http_proxy: http://localhost:9090/

#
# The list of URLs, with suitable recursion.
#

start_url: http://localhost:9090/htdig/start4.html

max_hop_count: 99999

exclude_urls: !none!

limit_urls_to:

#
# Set the modification time of the pages to the current time.
#

modification_time_is_now: true

#
# Other options I like
#
max_doc_size:           70000000
max_head_length:        1000000

allow_numbers: true

valid_punctuation: .-_/!#$%^&*'"

locale: de_DE

server_wait_time: 2
                                                                            

external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl \
                  application/msword->text/html /usr/local/bin/doc2html.pl \
                  application/rtf->text/html /usr/local/bin/doc2html.pl \
                  text/rtf->text/html /usr/local/bin/doc2html.pl

---------------

This is my wwwoffle-htdig-full-script:

#!/bin/sh

#### THE SPOOL DIR IS "/var/spool/wwwoffle" IN THE LINE BELOW ####

wwwoffle_spool=/spool/wwwoffle

####

# Set the path to include the htdig executables

PATH=$PATH:/opt/www/htdig/bin
export PATH

# Set the temporary directory used for merging

#TMPDIR=/tmp
#export TMPDIR

# Set up a log file.

echo > $wwwoffle_spool/html/htdig/wwwoffle-htdig.log

# Do the digging and merging

htdig -i -v -c $wwwoffle_spool/html/htdig/conf/htdig-full.conf >>
$wwwoffle_spool/html/htdig/wwwoffle-htdig.log 2>&1
htmerge -v -c $wwwoffle_spool/html/htdig/conf/htmerge.conf >>
$wwwoffle_spool/html/htdig/wwwoffle-htdig.log 2>&1

-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to