Il 00:05, gioved� 21 giugno 2001, hai scritto:
> According to Pietro Palladino:
> > Problem 1: my site has both html pages and documents (.doc, .rtf, etc.).
> > Is there a way to index html pages in a database and documents in a
> > different one?
> >
> > Problem 2: I'd like to search a word only in one of the two databases,
> > how could I implement this kind of search?
>
> First of all, are you sure you need 2 databases? You could do all
> this using the restrict and exclude input parameters to htsearch,
> to select which file extensions will be allowed in search results.
> See http://www.htdig.org/FAQ.html#q4.20
Well, I don't know if I really need 2 databases...in the meanwhile I've
indexed almost 760 documents and more than 450 pages....You can imagine how
slow is htdig now when I try to search something.
This is the reason why I'd like to use 2 different databases...in this way
the searches will be more fast (I suppose).
>
> If you still want to keep 2 separate databases, it can be a bit tricky.
> excluding the .doc, .rtf, etc. from the HTML page DB is easy - just add
> these to bad_extensions. Making a DB that excludes the HTML pages it
> a bit harder, because normally you count on the HTML links to find your
> way to the other documents. You'd need to build a list of URLs for the
> documents you want for this 2nd DB, as shown towards the end of FAQ 5.25.
> Then, you can select one of 2 config files from the search form, using
> the config input parameter (see http://www.htdig.org/hts_form.html), where
> each config file defines the database_dir or database_base for its own DB.
Ok, I'll try.....but first I need to understand some things 'cause my ideas
are a little bit confused now....Please, follow me in these steps....let's
begin from the beginning :-) :
1. Indexing
Ok, I installed htDig from an RPM file, so I couldn't set variables as
$BINDIR, $DBDIR, etc....anyway, now it works with the default options....
so I edit the htdig.conf file and change only some of them:
-external_parsers,
-database_dir,
-start_url,
-search_algorithm--> I kept only "exact:1" 'cause I have neither an italian
dictionary to use with htDig, neither a dictionary with "stop words"...I've
nothing but the default stuff that comes with the htdig archive :-(
Well, let's suppose that I don't want to use the rundig script, what are the
steps I've to follow to index my site?
Ok, I think: "htdig -i -a -s -v"......
"-i" because I want to erase any previous indexing and I want to rebuild the
databases;
"-a" because I want to use the search engine when it is reindexing my site,
so I reindex the site on a second copy of the databases.
Is all right until this point?
Ok, now I need htpurge: "htpurge -a -v"
Now I've a question: What does it purge????
When I use this program, I have messages like these:
htpurge: 1040
htpurge: 1050
htpurge: 1060
htpurge: 1070
Deleted, not found: ID: 813 URL:
http://www.unina.it/universit/concorsi/borse_ric/bandi/OLD/OLD/scalim1.doc
Deleted, not found: ID: 973 URL:
http://www.unina.it/universit/concorsi/personaleTA/ortob.doc
Deleted, not found: ID: 1040 URL: http://www.unina.it/rete/citta/repertori.php
Ok, I think, it didn't found that files so it deleted them from the
database.... but if I run again htpurge, I obtain the same messages
again...So? What does it purge? Mah...
I have a different messages too that don't appear when I run htpurge the
second time:
htpurge: Discarding affari
htpurge: Discarding agenzie
htpurge: Discarding allegato
htpurge: Discarding allegato
htpurge: Discarding allegato
htpurge: Discarding amministrazioni
htpurge: Discarding apporre
htpurge: Discarding area
htpurge: Discarding arte
What does they mean?
The first step stops here (I think it's enough). Please, think me as a newbe
'cause all the docs that I have are in English and I don't ever understand
everything.
Thank you.
------------------------------------------------------------------------------------------------
A presto!!!
Pietro Palladino
<[EMAIL PROTECTED]>
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html