According to Daniel Escobar:
> I apologize if this is a dumb question, but why is it that one of the
> databases that I'm building for one of my domains, has been building
> for over 24 hours!! What is really strange, is that other domains that
> I have only took 20 minutes, and there are more pages in those domains
> than in the one that is taking for ever! Below is an example of the
> output from rundig (verbose mode) from the domain that I'm having
> problems with:
> 
> 
>106142:106142:2:http://hawaiian105.com/index.html/things_to_do/music/things_to_do/fun/things_to_do/about_us/music/about_us/fun/:
> *-****-**--++*++*********-*******************************---*-********-**- size = 
>17340
> 
> Why are all those directories after index.html????  Does anyone have
> any idea why the other domains don't do that, but this one does? Due
> to that, the file size of one of the db's is over 500 megs when it
> should be about 9 or so.

I've seen this happen when htdig hits a link to an SSI page and the URL
has an extra trailing slash.  This can happen with either .shtml pages or
.html pages that use the XBitHack.  There are two things you can do:

1) hunt down the pages with the incorrect links, i.e. search for ".html/"
in URLs in your documents, and fix these links; or

2) add .shtml/ .html/ to your exclude_urls setting to get htdig to ignore
these defective links.

Option 2 is obviously easier, but you run the risk that htdig will miss
some SSI pages if the only links to them have the trailing slash, so you
may want to try hunting down the links anyway.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to