Il 00:05, gioved� 21 giugno 2001, hai scritto:
> According to Pietro Palladino:
> > Problem 1: my site has both html pages and documents (.doc, .rtf, etc.).
> > Is there a way to index html pages in a database and documents in a
> > different one?
> >
> > Problem 2: I'd like to search a word only in one of the two databases,
> > how could I implement this kind of search?
>
> First of all, are you sure you need 2 databases?  You could do all
> this using the restrict and exclude input parameters to htsearch,
> to select which file extensions will be allowed in search results.
> See http://www.htdig.org/FAQ.html#q4.20


Well, I don't know if I really need 2 databases...in the meanwhile I've 
indexed  almost 760 documents and more than 450 pages....You can imagine how 
slow is htdig now when I try to search something. 
This is the reason why I'd like to use 2 different databases...in this way 
the searches will be more fast (I suppose).


>
> If you still want to keep 2 separate databases, it can be a bit tricky.
> excluding the .doc, .rtf, etc. from the HTML page DB is easy - just add
> these to bad_extensions.  Making a DB that excludes the HTML pages it
> a bit harder, because normally you count on the HTML links to find your
> way to the other documents.  You'd need to build a list of URLs for the
> documents you want for this 2nd DB, as shown towards the end of FAQ 5.25.
> Then, you can select one of 2 config files from the search form, using
> the config input parameter (see http://www.htdig.org/hts_form.html), where
> each config file defines the database_dir or database_base for its own DB.


Ok, I'll try.....but first I need to understand some things 'cause my ideas 
are a little bit confused now....Please, follow me in these steps....let's 
begin from the beginning :-) :

1. Indexing

Ok, I installed htDig from an RPM file, so I couldn't set variables as 
$BINDIR, $DBDIR, etc....anyway, now it works with the default options....
so I edit the htdig.conf file and change only some of them:
-external_parsers,
-database_dir,
-start_url,
-search_algorithm--> I kept only "exact:1" 'cause I have neither an italian 
dictionary to use with htDig, neither a dictionary with "stop words"...I've 
nothing but the default stuff that comes with the htdig archive :-(
Well, let's suppose that I don't want to use the rundig script, what are the 
steps I've to follow to index my site?
Ok, I think: "htdig -i -a -s -v"......
"-i" because I want to erase any previous indexing and I want to rebuild the 
databases;
"-a" because I want to use the search engine when it is reindexing my site, 
so I reindex the site on a second copy of the databases.

Is all right until this point?

Ok, now I need htpurge: "htpurge  -a -v"
Now I've a question: What does it purge????
When I use this program, I have messages like these:

htpurge: 1040
htpurge: 1050
htpurge: 1060
htpurge: 1070
Deleted, not found: ID: 813 URL: 
http://www.unina.it/universit/concorsi/borse_ric/bandi/OLD/OLD/scalim1.doc
Deleted, not found: ID: 973 URL: 
http://www.unina.it/universit/concorsi/personaleTA/ortob.doc
Deleted, not found: ID: 1040 URL: http://www.unina.it/rete/citta/repertori.php

Ok, I think, it didn't found that files so it deleted them from the 
database.... but if I run again htpurge, I obtain the same messages 
again...So? What does it purge? Mah...
I have a different messages too that don't appear when I run htpurge the 
second time:

htpurge: Discarding affari
htpurge: Discarding agenzie
htpurge: Discarding allegato
htpurge: Discarding allegato
htpurge: Discarding allegato
htpurge: Discarding amministrazioni
htpurge: Discarding apporre
htpurge: Discarding area
htpurge: Discarding arte

What does they mean?


The first step stops here (I think it's enough). Please, think me as a newbe 
'cause all the docs that I have are in English and I don't ever understand 
everything.

Thank you.

------------------------------------------------------------------------------------------------
 
A presto!!!

                        Pietro Palladino
                         <[EMAIL PROTECTED]>

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to