+ > + > We set max_hop_count to 0 [is that ok?]. The htdig process completed + > but when we try to search it says htmerge failed. + + Unless you have each URL that you want indexed listed individually, + setting max_hop_count to 0 is most likely not what you want to do. This + would result in the rejection of every page found, except for the + page(s) listed in start_url. Choosing an appropriate value requires + knowledge of the way the site is laid out. The value of max_hop_count + is essentially a limit on how far htdig should drill down from the + initial page. If you are trying to use this attribute to prevent + recursion, you need to pick something large enough to prevent exclusion + of legitimate pages, but small enough to limit extensive recursion. +
We set max_hop_count to 4 and it works. However, as you folks rightly mentioned earlier, the problem is with infinite loop. We are using Apache on Linux. The problem is this. http://www.thatscricket.com/women_cricket/index.html is a valid URL But http://www.thatscricket.com/women_cricket/index.html/ seems to be valid :-( And in some file by mistake the HTML folks have add the trailing "/" by mistake. That is something I am having fixed, however how do I get the webserver to report http://www.thatscricket.com/women_cricket/index.html/ to be a bad URL (404)? -- -- B.G. Mahesh mailto:[EMAIL PROTECTED] http://www.indiainfo.com/ India's first ISO certified portal ------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

