+ >
+ > We set max_hop_count to 0 [is that ok?]. The htdig process completed 
+ > but when we try to search  it says htmerge failed.
+ 
+ Unless you have each URL that you want indexed listed individually, 
+ setting max_hop_count to 0 is most likely not what you want to do. This 
+ would result in the rejection of every page found, except for the 
+ page(s) listed in start_url. Choosing an appropriate value requires 
+ knowledge of the way the site is laid out. The value of max_hop_count 
+ is essentially a limit on how far htdig should drill down from the 
+ initial page. If you are trying to use this attribute to prevent 
+ recursion, you need to pick something large enough to prevent exclusion 
+ of legitimate pages, but small enough to limit extensive recursion.
+ 

We set max_hop_count to 4 and it works. However, as you folks rightly
mentioned earlier, the problem is with infinite loop.

We are using Apache on Linux. The problem is this.

http://www.thatscricket.com/women_cricket/index.html is a valid URL
But http://www.thatscricket.com/women_cricket/index.html/ seems to be valid :-(
And in some file by mistake the HTML folks have add the trailing "/" by mistake.
That is something I am having fixed, however how do I get the webserver to
report http://www.thatscricket.com/women_cricket/index.html/ to be a bad URL (404)?



-- 
-- 
B.G. Mahesh                    
mailto:[EMAIL PROTECTED]
http://www.indiainfo.com/
India's first ISO certified portal


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: 
Dedicated Hosting for just $79/mo with 500 GB of bandwidth! 
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to