Two ways. One is to stop the recursion using max_hop_count: http://www.htdig.org/attrs.html#max_hop_count
(This will prevent htdig from going in circles "forever.")
We set max_hop_count to 0 [is that ok?]. The htdig process completed but when we try to search it says htmerge failed.
Unless you have each URL that you want indexed listed individually, setting max_hop_count to 0 is most likely not what you want to do. This would result in the rejection of every page found, except for the page(s) listed in start_url. Choosing an appropriate value requires knowledge of the way the site is laid out. The value of max_hop_count is essentially a limit on how far htdig should drill down from the initial page. If you are trying to use this attribute to prevent recursion, you need to pick something large enough to prevent exclusion of legitimate pages, but small enough to limit extensive recursion.
As for the bit about "htmerge failed", how did you build the index? Did you use the rundig script? Or did you just run htdig? The rundig script runs htmerge automatically, but if you instead use htdig directly, it is necessary to manually run htmerge afterwards.
Second, try running "htdig -v" and save the output to peruse, looking for
infinite loops, e.g.
http://www.foo.com/blah/blah/blah/blah/blah/
Tried this...it just freezes and we don't see any infinite loop :-(
The freeze was likely due to some other sort of problem (network issues maybe). If htdig itself is working, which it must be if you are indexing sites, then there is no reason why -v would change that. The only thing -v does is enable extra output from htdig.
-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

