Just solved the problem of missing locales. Better solutions can be found on FreeBSD and NetBSD, however.
Thinking a little bit about internationalization. Locales are nice and for htdig it seems a good way to find out what is a printable character and what is not. But, DO I NEED A SPECIAL LOCALE? Sometimes yes. But, preferably not. Assume: 1. you are member of an organization coming with a great scientific magazine. The publishing language in fact would be english for better understanding. All your authors come from countries all around the world, many of them are Europeans, some are Chinese or Japans, etc. 2. the magazine mentioned should be published in the web and you want to provide htdig as a search engine. 3. names usually will contain any printable character which you can find in ISO 8859-1 and others. Question: DO THE PEOPLE HERE RELY ON LOCALES? NO! Otherwise (some) names in general wouldn't be found in the search engine, esp. if you want to provide special and/or different locales from htdig. This leads to my conclusion: I preferably don't need a locale. What I really need is a full ISO character set for general purpose. This should be the default and without any assumption on what is found on the local operating system. I only have to distinguish between printable characters that *may*be*contained*in*a*word* (e.g. a name) and characters, that don't. And I only need a conversion table to convert these characters into their corresponding lower case values, like htdig wishes to do to build a word database and index tables. So, maybe it would be a better way to provide htdig with a default behaviour of presenting a complete character set for one of the most common ISO standards, e.g. ISO 8859-1. What really does a locale? Right, the same thing: it provides a character mapping table, which describes the interpretation of character codes. Other intersting questions: What, if the charset is explicitly defined within the document and neither htdig, nor your current locale won't match? - Good question... What is the main focus of a search engine? - Right: index and search web documents. - False: index documents with assumptions where itself lives. - False again: ignore the content and charsets of the document itself. Finally it could be a better way to fully support UTF-8 instead of 8-bit characters only... _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

