Re: [htdig] newbie - help! crawling google

Tony Crockford Mon, 26 Apr 2004 09:29:51 -0700

At 17:09 on Monday, 26 Apr 2004, Anu Vaidyanathan wrote:

Tony,

1, why are you trying toindex google.com and not a site that you are in
control of. What results do you expect?


I expect it to give me a list of URL's it finds when it searches for a
certain string (say april fool) - I could use a wget on the google index
pages but this doesnt seem to work for whatever reason.  And after I get
that list - I hope this thing will recursively crawl and fetch each page
on that list - but, you could argue that the second bit may not be
possible with htdig.

but google.com isn't the index - it's the search page - not much there for htdig to index.

htdig crawls from page to page following links.

to get the results you're expecting you'd have to crawl the results pages, but they don't exist until you do a search and they're dynamic.

2, what do you get if you do rundig -vvv ?

what about rundig -vv - should be less verbose, but maybe more meaningful information.

-------


then I perform a htsearch and I get the mystical html file with this
output in the middle:
Unable to read word database file '/opt/www/htdig/db/db.words.db' Did you
run htmerge?

So did you run htmerge?


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Re: [htdig] newbie - help! crawling google

Reply via email to