RE: [htdig] part of the website can't be indexed

Stephen L Arnold Wed, 14 Nov 2001 18:49:22 -0800

On 15 Nov 01, at 8:40, Tony Melia wrote:

> run it again with a -v at the end of whatever command you used
> to index it in the first place and watch the output.  If it takes
> a while try 'rundig -v >/tmp/mylog.txt' and go through the log
> file when it has finished.


> From: edwin lin [mailto:[EMAIL PROTECTED]]
> 
> I run htdig to dig two web sites. One website is only partially
> indexed. What went wrong? Thanks, Yixiong 

The above is good advice, but it sounds like the part that's not 
getting indexed may not be reachable, at least not via the top-
level URL.  Remember, htdig doesn't index files on your hard-drive, 
it can only follow URLs.

I used htdig on a little document server project (for internal 
company use) and we ended up using a home-spun perl script to 
create a list of URLs pointing to individual M$ Word documents in a 
fairly complicated directory structure.  It works great.  It also 
worked fine using apache's automatic indexing, but then the index 
pages sometimes clutter up the search results.  That was a 
management peeve, so we turned off apache indexing and went with 
the above work-around.

Come to think of it, I promised to contribute that script a while 
back; who should I send it to?  The guy who wrote it isn't into 
supporting it, but he's okay with me sending it in.  It's not that 
cryptic (as far as perl goes, anyway).  Let me know...

HTH, Steve

*************************************************************
Steve Arnold                           http://arnolds.dhs.org

Java is for staying up late while you program in Python...

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

RE: [htdig] part of the website can't be indexed

Reply via email to