On Tue, 21 Mar 2000 [EMAIL PROTECTED] wrote:
> db.words.db
> db.docdb
> db.docs.index
>
> Presumably, these are in some fairly-standard database format; if I could
> determine what this is, and obtain field lists, it would be a major step
> forward.
You'll be *much* happier parsing db.wordlist for the word database, which
is an ASCII file. You'll also be much happier using the -t flag for htdig
and parsing the resulting db.docs text file.
Both files have records separated by \n characters and fields separated by
tabs with field labels before each field (label:field)
The wordlist format is:
word <tab> i:DocID <tab> l:location <tab> w:weight <tab> c:count <tab> a:anchor
Note that count and anchor are optional and are dropped if they're the
default.
The fields in the db.docs are a bit more complex, but if you're willing to
read the source, they're in DocumentDB.cc under "CreateSearchDB" with the
key fields being the DocID and the URL (the first two).
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.