According to Carl Edwards:
> With you suggestion won't I end up with all the 
> URL's that don't require authorization in the
> db twice?  Once with each server name.
> 
> How about if I index first the insecure server
> then the secure server with a a server alias?
> So if it finds the file already exists in the
> db under another name it will not reread it?

Do you really think it's a good idea to put your secure and insecure
documents into the same database?  This opens up a big security hole if
this database is searchable from the insecure server.  (See
http://www.htdig.org/FAQ.html#q4.20)

You'd probably be better off indexing the secure and insecure sites
separately, and making only the insecure database available from the
insecure site.  For the secure site, you could either offer two separate
searches, with 2 different config files, or you could merge the two
databases together and make that available only on the secure site.

I'm not sure what you're suggesting by using a server alias, but it
doesn't sound to me like this would work, security considerations aside.
If you use server_aliases, then htdig will try to fetch the documents
using the "canonical" server name, so it would be essentially the same
as if you only had the one server listed in your start_url.

> >From: Geoff Hutchison <[EMAIL PROTECTED]>
...
> >> What I would really like is to have both servers
> >> in the start_url list and if a requires authentication
> >> response is received from the request then authenticate
> >> and save that servers name in the db.
> >
> >If you enter both servers in the start_url, it will index both. You 
> >can't do quite what you want, since an authentication will be sent 
> >regardless of whether it's asked. (This doesn't make much difference 
> >since the non-authenticating server will just ignore that header.)
> >
> >So what I'd suggest is something like this:
> >
> >start_url: http://server1.foo.com/ http://secure1.foo.com/
> >authentication: ...
> >local_urls: http://server1.foo.com/=/path/to/files/ \
> >             http://secure1.foo.com/=/path/to/files/
> >
> >Since each document has one URL and only one URL, you'll need to index 
> >"twice." Of course since you're indexing on the filesystem and your OS 
> >will cache files, indexing should be relatively quick.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to