Re: [htdig] question about rundig.3.2.sh

Gilles Detillieux Wed, 25 Sep 2002 08:39:41 -0700

According to Greg Fenton:
> I am running htdig-3.2.0-2.011302 (ships with RH 7.3).


OK, just be aware that it's based on the Jan 13/02 snapshot of 3.2.0b4,
which can't handle indexing of password-protected sites.  Support for
Basic authentication was broken at that time.  Apart from that, you should
be OK for the most part.

> I have downloaded rundig.3.2.sh from the contributed work section of
> the ht://Dig website.
> 
> The script contains the following:
> 
>   # Move them into place. Since these are only used by htdig for update
>   # digs and we always use -a, we just leave them as .work
>   # mv $DBDIR/db.docs.index.work $DBDIR/db.docs.index
> 
> Now, nowhere else in the script is anything done with the .index file.
> 
> So, does this line need to be uncommented?
> Should this be a "cp" instead of a "mv"?

As the comment says, since "these are only used by htdig for update digs
and we always use -a, we just leave them as .work".  So, unless you plan
to run htdig without -a, you shouldn't need the non-.work version.  At
least, according to the attrs.html documentation for this snapshot, the
doc_index attribute is used only by htdig.

However, in looking at the source code, I do see a discrepancy there.
It seems there are references to doc_index all over the place, in
htdig, htload, htmerge, and htpurge (which all make sense to me), but
also in htdump, htstat and htsearch (which doesn't make sense to me).

Geoff, would you care to comment?  Why do these programs which purportedly
don't need db.docs.index, or at least shouldn't need it, still seem to
require it.  Is there a problem with the way the DB handling code is
structured right now that requires us to open the index even if we don't
use it?

> Is there a document as to what each db file is and how it is used in
> the overall ht://Dig process?  For example, does "htsearch" need all of
> the files in the DBDIR or are some of them used only during the
> digging/merge phases?

Theoretically, you should be able to find all these answers in the
"attrs.html" documentation for your release.  In your case, the file
http://www.htdig.org/dev/htdig-3.2/attrs.html should be pretty close,
but /usr/share/doc/htdig-3.2.0/attrs.html on your system will be even
closer.  Each attribute description includes a list of the programs
that use it, so if you search for the attributes that define the names
of each database file, you should have the information you're looking
for (above discrepancies notwithstanding).

> I see that some other scripts indicate that it is not necessary to
> rebuild endings and synonym files once they have been created.  Is it
> the case that they never need to be rebuilt, or just not as often as a
> normal site recrawl?

The endings and synonyms database files are not based on the words
indexed by htdig, so you don't need to rebuild after reindexing.  This is
unlike accents, soundex and metaphone, whose databases are based on the
words in db.words.db.  You only have to rebuild the endings database if
you change english.0 or english.aff (or the dictionary and affix file
for the language of your choice, selected by endings_dictionary and
endings_affix_file), and you only have to rebuild the synonyms database
if you change your synonyms file (selected by synonym_dictionary).

The only other time you'd need to rebuild these is if the database
format itself changes.  This would happen if you use a different version
of the DB code, as when you switch from a 3.1.x release to a 3.2.0bx
release or vice-versa, or if you migrate to a different machine with
different integer size or format.

> I am trying to make rundig.3.2.sh into an efficient, flexible script to
> allow a build to take place on one machine and searching on another
> with as small a chance of downtime as possible (moving db files into
> place).
> 
> I'll happily contribute what I come up with once done, but I don't feel
> I have enough knowledge yet to be sure of what I am creating...

If you're moving DB files from one machine to another, beware of different
machine architectures.  Unless all the integer and floating point formats
and sizes are the same on both machines, the databases from one will not
work on the other.  In those situations, you can use htdump and htload to
export and import ASCII versions of the htdig database files.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] question about rundig.3.2.sh

Reply via email to