According to Thilo Bauer:
> Thinking a little bit about internationalization.
> 
> Locales are nice and for htdig it seems a good way
> to find out what is a printable character and what
> is not.
> 
> But,
>   DO I NEED A SPECIAL LOCALE?
> 
> Sometimes yes. But, preferably not.
> 
> Assume:
> 
> 1. you are member of an organization coming with a
> great scientific magazine. The publishing language
> in fact would be english for better understanding.
> All your authors come from countries all around the
> world, many of them are Europeans, some are Chinese
> or Japans, etc.
> 
> 2. the magazine mentioned should be published in the
> web and you want to provide htdig as a search engine.
> 
> 3. names usually will contain any printable character
> which you can find in ISO 8859-1 and others.
> 
> Question: DO THE PEOPLE HERE RELY ON LOCALES?
> 
> NO! Otherwise (some) names in general wouldn't be found
> in the search engine, esp. if you want to provide special
> and/or different locales from htdig.
> 
> This leads to my conclusion: I preferably don't need
> a locale. What I really need is a full ISO character set
> for general purpose. This should be the default and
> without any assumption on what is found on the local
> operating system.
> 
> I only have to distinguish between printable characters
> that *may*be*contained*in*a*word* (e.g. a name) and
> characters, that don't. And I only need a conversion
> table to convert these characters into their corresponding
> lower case values, like htdig wishes to do to build
> a word database and index tables.
> 
> So, maybe it would be a better way to provide htdig
> with a default behaviour of presenting a complete
> character set for one of the most common ISO standards,
> e.g. ISO 8859-1.
> 
> What really does a locale? Right, the same thing: it
> provides a character mapping table, which describes
> the interpretation of character codes.
> 
> Other intersting questions:
> 
> What, if the charset is explicitly defined within the
> document and neither htdig, nor your current locale
> won't match? - Good question...
> 
> What is the main focus of a search engine?
> 
> - Right: index and search web documents.
> - False: index documents with assumptions where itself lives.
> - False again: ignore the content and charsets of the document itself.
> 
> Finally it could be a better way to fully support
> UTF-8 instead of 8-bit characters only...

You make some very good points, but I think they've all been covered in
previous discussions on the topic.  I suggest you search through the
mailing list archives to catch up on earlier discussions.

You don't need to convince us of the need for all this - we already know
that we need to be able to support an expanded character set like UTF-8
or Unicode, independently of system locale definitions, to get htdig to
work properly with any language.  What we need is someone who's willing
to tackle this non-trivial project.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to