According to Greg Fenton: > I am running htdig-3.2.0-2.011302 (ships with RH 7.3).
OK, just be aware that it's based on the Jan 13/02 snapshot of 3.2.0b4, which can't handle indexing of password-protected sites. Support for Basic authentication was broken at that time. Apart from that, you should be OK for the most part. > I have downloaded rundig.3.2.sh from the contributed work section of > the ht://Dig website. > > The script contains the following: > > # Move them into place. Since these are only used by htdig for update > # digs and we always use -a, we just leave them as .work > # mv $DBDIR/db.docs.index.work $DBDIR/db.docs.index > > Now, nowhere else in the script is anything done with the .index file. > > So, does this line need to be uncommented? > Should this be a "cp" instead of a "mv"? As the comment says, since "these are only used by htdig for update digs and we always use -a, we just leave them as .work". So, unless you plan to run htdig without -a, you shouldn't need the non-.work version. At least, according to the attrs.html documentation for this snapshot, the doc_index attribute is used only by htdig. However, in looking at the source code, I do see a discrepancy there. It seems there are references to doc_index all over the place, in htdig, htload, htmerge, and htpurge (which all make sense to me), but also in htdump, htstat and htsearch (which doesn't make sense to me). Geoff, would you care to comment? Why do these programs which purportedly don't need db.docs.index, or at least shouldn't need it, still seem to require it. Is there a problem with the way the DB handling code is structured right now that requires us to open the index even if we don't use it? > Is there a document as to what each db file is and how it is used in > the overall ht://Dig process? For example, does "htsearch" need all of > the files in the DBDIR or are some of them used only during the > digging/merge phases? Theoretically, you should be able to find all these answers in the "attrs.html" documentation for your release. In your case, the file http://www.htdig.org/dev/htdig-3.2/attrs.html should be pretty close, but /usr/share/doc/htdig-3.2.0/attrs.html on your system will be even closer. Each attribute description includes a list of the programs that use it, so if you search for the attributes that define the names of each database file, you should have the information you're looking for (above discrepancies notwithstanding). > I see that some other scripts indicate that it is not necessary to > rebuild endings and synonym files once they have been created. Is it > the case that they never need to be rebuilt, or just not as often as a > normal site recrawl? The endings and synonyms database files are not based on the words indexed by htdig, so you don't need to rebuild after reindexing. This is unlike accents, soundex and metaphone, whose databases are based on the words in db.words.db. You only have to rebuild the endings database if you change english.0 or english.aff (or the dictionary and affix file for the language of your choice, selected by endings_dictionary and endings_affix_file), and you only have to rebuild the synonyms database if you change your synonyms file (selected by synonym_dictionary). The only other time you'd need to rebuild these is if the database format itself changes. This would happen if you use a different version of the DB code, as when you switch from a 3.1.x release to a 3.2.0bx release or vice-versa, or if you migrate to a different machine with different integer size or format. > I am trying to make rundig.3.2.sh into an efficient, flexible script to > allow a build to take place on one machine and searching on another > with as small a chance of downtime as possible (moving db files into > place). > > I'll happily contribute what I come up with once done, but I don't feel > I have enough knowledge yet to be sure of what I am creating... If you're moving DB files from one machine to another, beware of different machine architectures. Unless all the integer and floating point formats and sizes are the same on both machines, the databases from one will not work on the other. In those situations, you can use htdump and htload to export and import ASCII versions of the htdig database files. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

