According to Richard Andrews:
> Hi Gilles,
> 
> well I managed to get the thing working and here's what I ended up doing. If
> there is anything "wrong" or "dirty hack" with this I'd be interested to
> know.
> 
> * By changing the start URLs to http://localhost/etc/etc/etc/ I got half of
> the documentation indexed but not the other half.
> The other half was continually indexed using file://localhost/etc/etc/etc
> even though it had the same start_url format.
> 
> * Then I eventually realised that the second half of the indexable docs were
> in directory "kdelibs-3.0.0" but were referenced by a symbolic link
> "kdelibs-3". The start url pointed to "kdelibs-3" since I thought that the
> htdig program would follow the link to the referenced directory. Instead it
> just use to say "http://localhost/blah/blah/blah/kdelibs-3/ cannot be found"
> and then went on to index the referenced directory using the
> "file://localhost/etc/etc/etc/kdelibs-3.0.0/" form - and at the end of the
> process documents htdig'ed with "file://localhost/" were not searchable cf.
> those htdig'ed with "http://localhost/"; (tested within KDE).

Well, I'm having trouble visualizing the whole directory structure and
where the symlinks point, but local_urls processing shouldn't get thrown
off by symlinks.  You do have to get rid of all vestiges of file:/ URLs
in your config, though, and reindex from scratch.  If you have any remaining
file:/ URLs anywhere in your config or in you database, they're going to
cause problems with 3.1.x.

> * By making the start URL point directly to the actual directory it worked
> but in order to open the documents from their links in the htdig search page
> within KDE I had to configure an apache Named virtual-host "localhost" with
> "/" as it's webroot - very dirty and very unsafe. I'll need to play around
> with that I think.

Yuck!  Get rid of that, or you're probably asking for trouble.  It would
be better to find out what is the common directory that all your document
files share, and pick that as your DocumentRoot, instead of using / as the
DocumentRoot.  Then, the common directory path name wouldn't be used in
the URLs of the documents, but only on the right-hand side of the local_urls
definition.

> * The only way I found (through experimenting) to stop the file:// indexing
> was to REMOVE the local_urls: http://localhost/=/ directive altogether.
> Otherwise it kept trying to index the second half with file:// URLs all the
> time - even when the start URL was pointing towards the actual directory
> (and not the symbolic link) - is that because the file:// URLs were listed
> in the doc.index.db?

Well, if we are indeed still talking about 3.1.x, and not 3.2 betas,
then htdig doesn't look at db.docs.index - only db.wordlist and db.docdb,
and these two files should be regenerated from scratch with "htdig -i"
(or rundig).  If you're reindexing from scratch, and don't have any file:/
URLs anywhere in your config, then I can't imagine where htdig would
pick up any such URLs.  file:/ URLs in hrefs are rejected by htdig 3.1.x.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to