[htdig] problems with multiple indexes/conf files

Matt Price Fri, 18 Jun 2004 10:49:21 -0700

Hi folks,

I'm just setting up htdig, and running into a few configuration
problems I hope you can help me with.


I have two databases on this ocmputer and may add more later.  The
first is public and indexes several work websites (I teach at a
university); the second is private and indexes some private
family-related sites.    I'm using the debian sid/unstable packages
version  3.2.0b5-f (the -5 is the debian package version).  

For the public index, I use the default locations for conf files, html
templates, and database; in debian these are /etc/htdig, /etc/htdig
again, and /var/lib/htdig.

I want to be careful to keep the other index private, so I keep the
conf files & html templates in /etc/htdig/htdig-local, and the
database in /var/lib/htdig-local.  As suggested in the docs & various
posts on this list, I access the private index using a one-line wrapper script
in a password-protected location.  Here's the script:  

#!/bin/sh
# for some reason, having trouble passing the COMMON_DIR variable
COMMON_DIR=/etc/htdig/htdig-local CONFIG_DIR=/etc/htdig/htdig-local  
/usr/lib/cgi-bin/htsearch [EMAIL PROTECTED]"$@"}  

-----
as the comment indicates, the script doesn't work perfectly.  Htdig
searches the private index, which is great, but it doesn't return
results using the modified templates (header.html, footer.html,
wrapper.html) that I've put into /etc/htfdig/htdig-local.  This is a
pain and a little confusing, since I can't tell when I look at the
form which index I'm searching.  

so my main question:  is there something wrong with the script and/or
my setup, or is this a bug with the debian packages and/or the new
beta?  


I also have a question of secondary importance.  my course websites
have a fair number of external links.  I would love for ht://dig to
index the pages linked to, but NOT keep crawling further along the
chain of links.  That is, when ht://dig sees a link to an external
page, it would follow that link, index it, but NOT go any further.
Even better would be if those links that wget calls "page-requisite"
-- links that need to be loaded in order to view the page properly --
are also indexed.  

In conjuncion with this, I'd also love it if htdig could, like wget,
use mozilla's cookies file to acceass login controlled sites like the
new york times.  

So, is it possible for htdig to do this?  Has anyone else tried it?


Anyway, thanks loads for the help,

Matt


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

[htdig] problems with multiple indexes/conf files

Reply via email to