According to Daniel Escobar: > I apologize if this is a dumb question, but why is it that one of the > databases that I'm building for one of my domains, has been building > for over 24 hours!! What is really strange, is that other domains that > I have only took 20 minutes, and there are more pages in those domains > than in the one that is taking for ever! Below is an example of the > output from rundig (verbose mode) from the domain that I'm having > problems with: > > >106142:106142:2:http://hawaiian105.com/index.html/things_to_do/music/things_to_do/fun/things_to_do/about_us/music/about_us/fun/: > *-****-**--++*++*********-*******************************---*-********-**- size = >17340 > > Why are all those directories after index.html???? Does anyone have > any idea why the other domains don't do that, but this one does? Due > to that, the file size of one of the db's is over 500 megs when it > should be about 9 or so.
I've seen this happen when htdig hits a link to an SSI page and the URL has an extra trailing slash. This can happen with either .shtml pages or .html pages that use the XBitHack. There are two things you can do: 1) hunt down the pages with the incorrect links, i.e. search for ".html/" in URLs in your documents, and fix these links; or 2) add .shtml/ .html/ to your exclude_urls setting to get htdig to ignore these defective links. Option 2 is obviously easier, but you run the risk that htdig will miss some SSI pages if the only links to them have the trailing slash, so you may want to try hunting down the links anyway. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

