On Tue, 1 Jun 2004, Daniel Almendra wrote:

>     I couldn't find information about the flags in the db.worddump
> file (generated using the option -t in htdig). What do they mean exactly?
>     In the site FAQ, I've found only information about the db.wordlist
> file, created by htdig 3.1.x versions or earlier, but I'm running
> version 3.2.0b4...

For 3.2.0b?, you could check http://www.htdig.org/dev/htdig-3.2/htdump.html
However in this case I am not sure that it will be of much help.

>     The specific flags I couldn't understand were:

I am not 100% certain on all of these, but hopefully someone will correct
me if I am too far off.

>     -id: this field is always "0" in my dumps;

I think you are referring to 'document id'; at least that is how it shows
up when I perform a dump. This should most definitely not be 0 in all of
your dumps. This is a unique id that maps back to the document that
contains the corresponding term.

>     -flags: if this should be the number of ocurrences of the word in
> the document, it's generally incorrect in my dumps;

I believe the flags field encodes context information associated with the
term, such as whether it came from a title element, an h? element, a meta,
etc. I think this value corresponds to the 'factor' array in initialized
in the 'Retriever' constructor (Retriever.cc). I don't think this value
has anything to do with the number of occurrences.

>     -location: this field is also "0" for all words;

This value should correspond to where the word occurs in the document. As
with document id, it should not be 0 for all words. Are you sure that you
are reading the columns correctly? They don't actually match up with the
headers in terms of horizontal placement.

>     -anchor: this field is empty for all entries.

I think that this one is an index into the list of document anchors stored
in the document database (i.e. the 'A' fieldname documented on the htload
page).

Jim


-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to