I am using htdig 3.1.2, and my config file includes:
extra_word_characters: _
valid_punctuation: !@#$%^&*()-+|~=`{}[]:";'<>?,./
I find that the word database build by htdig includes many words that
contain or end in a comma or other punctuation. For example:
arts, i:2514 l:1 w:49950
assessed, i:2523 l:1 w:49950
atmospheric, i:2529 l:1 w:49950
b.sc, i:120 l:1 w:49950
b.sc, i:16406 l:1 w:49950
b.sc, i:16409 l:1 w:49950
b.sc, i:3039 l:1 w:49950
b.sc, i:3040 l:1 w:49950
b.sc, i:3041 l:1 w:49950
ba, i:17 l:1 w:49950
Am I misunderstanding the documentation on "valid_punctuation"?
I can't figure out how the configuration file attributes
extra_word_characters
and
valid_punctuation
work together. What happens when the same character is in both?
Why doesn't the documented list of default characters for
valid_punctuation include the question mark (?) and the doublequote (")?
What separates words, is it whitespace only?
Thanks
--
David Adams
Computing Services
University of Southampton
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.