According to Tomas Frydrych: > I do have one question though; when defining valid_punctuation, do > I have to include ' ' (i.e. space), or is ' ' always included, and if I > have to include it explicitely, where/how do I put into in the string? No, white space characters (space, tab, newline) are treated separately from valid_punctuation and any other punctuation characters. The htdig parser uses the C library function isspace() to test if a character is a white space character, and these are usually defined by your locale, although with any ASCII or ISO character set these will be pretty much the standard three characters above, and perhaps a few more obscure ones. It would not make sense to add a space to valid_punctuation, nor can you. The valid_punctuation characters are those that are allowed within a compound word. Historically, a word like "post-doctoral" was indexed only as "postdoctoral" if the "-" was in valid_punctuation. In more recent versions, it is indexed as "postdoctoral", "post" and "doctoral". But you see how valid_punctuation characters have a special meaning within a word. They don't cause a distinct break between words the way that any other punctuation character would, or the way that white space would. E.g. the comma "," is not normally included in valid_punctuation so it always breaks words apart, while the hyphen or apostrophe can appear within a word (in English, in any case). -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
