At 17:09 on Monday, 26 Apr 2004, Anu Vaidyanathan wrote:

Tony,

1, why are you trying toindex google.com and not a site that you are in
control of. What results do you expect?

I expect it to give me a list of URL's it finds when it searches for a certain string (say april fool) - I could use a wget on the google index pages but this doesnt seem to work for whatever reason. And after I get that list - I hope this thing will recursively crawl and fetch each page on that list - but, you could argue that the second bit may not be possible with htdig.

but google.com isn't the index - it's the search page - not much there for htdig to index.


htdig crawls from page to page following links.

to get the results you're expecting you'd have to crawl the results pages, but they don't exist until you do a search and they're dynamic.






2, what do you get if you do rundig -vvv ?

what about rundig -vv - should be less verbose, but maybe more meaningful information.


-------

then I perform a htsearch and I get the mystical html file with this output in the middle: Unable to read word database file '/opt/www/htdig/db/db.words.db' Did you run htmerge?


So did you run htmerge?


------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to