Re: [htdig] 2 different search..

Pietro Palladino Thu, 28 Jun 2001 03:46:14 -0700
Il 23:08, luned� 25 giugno 2001, hai scritto:
> According to Pietro Palladino:
> > Il 00:05, gioved� 21 giugno 2001, hai scritto:
> > > According to Pietro Palladino:
> > > > Problem 1: my site has both html pages and documents (.doc, .rtf,
> > > > etc.). Is there a way to index html pages in a database and documents
> > > > in a different one?
> > > >
> > > > Problem 2: I'd like to search a word only in one of the two
> > > > databases, how could I implement this kind of search?
> > >
> > > First of all, are you sure you need 2 databases?  You could do all
> > > this using the restrict and exclude input parameters to htsearch,
> > > to select which file extensions will be allowed in search results.
> > > See http://www.htdig.org/FAQ.html#q4.20
> >
> > Well, I don't know if I really need 2 databases...in the meanwhile I've
> > indexed  almost 760 documents and more than 450 pages....You can imagine
> > how slow is htdig now when I try to search something.
> > This is the reason why I'd like to use 2 different databases...in this
> > way the searches will be more fast (I suppose).
>
> No, I can't imagine.  A total of 760+450 documents is still very small
> by ht://Dig standards.  Many users index over 10 times as many documents
> without any problems with searches being too slow.  Either you have a very
> slow machine, or there's a configuration problem of some sort.  How slow
> is slow?  How long does htsearch take to do a typical search on your site?

Too much time, almost 3 minutes!!! I don't know why. However I usually don't 
start httpd on my RH. I start it only when I need to test htdig.....I noticed 
that I need to relaunch rundig to have the best performances

>
> > > If you still want to keep 2 separate databases, it can be a bit tricky.
> > > excluding the .doc, .rtf, etc. from the HTML page DB is easy - just add
> > > these to bad_extensions.  Making a DB that excludes the HTML pages it
> > > a bit harder, because normally you count on the HTML links to find your
> > > way to the other documents.  You'd need to build a list of URLs for the
> > > documents you want for this 2nd DB, as shown towards the end of FAQ
> > > 5.25. Then, you can select one of 2 config files from the search form,
> > > using the config input parameter (see
> > > http://www.htdig.org/hts_form.html), where each config file defines the
> > > database_dir or database_base for its own DB.
> >
> > Ok, I'll try.....but first I need to understand some things 'cause my
> > ideas are a little bit confused now....Please, follow me in these
> > steps....let's begin from the beginning :-) :
> >
> > 1. Indexing
> >
> > Ok, I installed htDig from an RPM file, so I couldn't set variables as
> > $BINDIR, $DBDIR, etc....anyway, now it works with the default options....
>
> These variables determine where things are installed, but you can override
> all sorts of things in your htdig.conf, including the database_dir,
> common_dir, and many other attributes.
>


Fine!! These are good news....In this way I could put the files I need 
wherever I want, right?


> > so I edit the htdig.conf file and change only some of them:
> > -external_parsers,
> > -database_dir,
> > -start_url,
> > -search_algorithm--> I kept only "exact:1" 'cause I have neither an
> > italian dictionary to use with htDig, neither a dictionary with "stop
> > words"...I've nothing but the default stuff that comes with the htdig
> > archive :-(
>
> See http://www.htdig.org/FAQ.html#q4.10 .  There are many ispell
> dictionaries available in all sort of languages.  Surely you can find an
> Italian one. The endings algorithm is a really nice one to have, if you can
> take a bit of time to configure it for your language.
>


:-))) You are great!!! I found italian ispell dictionary, but I don't know 
how install it...I think that the instructions I found inside the package 
aren't for htdig, are it?



> > Well, let's suppose that I don't want to use the rundig script, what are
> > the steps I've to follow to index my site?
> > Ok, I think: "htdig -i -a -s -v"......
> > "-i" because I want to erase any previous indexing and I want to rebuild
> > the databases;
> > "-a" because I want to use the search engine when it is reindexing my
> > site, so I reindex the site on a second copy of the databases.
> >
> > Is all right until this point?
>
> Sure, but so far you're not doing anything you can't do with rundig.

:-)) Yes, I know. I'm just trying to understand that script to best customize 
it for my needs :-).....I'm also trying to understand which databases are 
created by each program....so, I could understand if there's something that 
doesn't work fine just looking the databases size.


> If you want to reindex from scratch by using htdig -i, that's exactly what
> rundig does.  You can give -s, -v and -a options to rundig as well, and
> with -a, rundig will do all the file management needed after htmerge or
> htpurge. Generally, the only need to move away from rundig to a specialised
> script is when you want to update an existing database rather than indexing
> from scratch each time.  For a small collection such as yours, indexing
> from scratch nightly should be feasible.


Hmmmm, I'm sorry, but I can't understand what does "indexing from scratch" 
mean...:-( could you better explain what were you saying? 
I've a problem too, though I'm using the -a flag, I can't succeed in 
searching anything when I launch rundig....where's the mistake? If I use the 
-a flag, does htsearch automatically manage the search using the .work 
database?


Thank you very much for your help :-)

                        Pietro Palladino
                         <[EMAIL PROTECTED]>

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] 2 different search..

Reply via email to