Just solved the problem of missing locales. Better
solutions can be found on FreeBSD and NetBSD, however.

Thinking a little bit about internationalization.

Locales are nice and for htdig it seems a good way
to find out what is a printable character and what
is not.

But,
  DO I NEED A SPECIAL LOCALE?

Sometimes yes. But, preferably not.

Assume:

1. you are member of an organization coming with a
great scientific magazine. The publishing language
in fact would be english for better understanding.
All your authors come from countries all around the
world, many of them are Europeans, some are Chinese
or Japans, etc.

2. the magazine mentioned should be published in the
web and you want to provide htdig as a search engine.

3. names usually will contain any printable character
which you can find in ISO 8859-1 and others.

Question: DO THE PEOPLE HERE RELY ON LOCALES?

NO! Otherwise (some) names in general wouldn't be found
in the search engine, esp. if you want to provide special
and/or different locales from htdig.

This leads to my conclusion: I preferably don't need
a locale. What I really need is a full ISO character set
for general purpose. This should be the default and
without any assumption on what is found on the local
operating system.

I only have to distinguish between printable characters
that *may*be*contained*in*a*word* (e.g. a name) and
characters, that don't. And I only need a conversion
table to convert these characters into their corresponding
lower case values, like htdig wishes to do to build
a word database and index tables.

So, maybe it would be a better way to provide htdig
with a default behaviour of presenting a complete
character set for one of the most common ISO standards,
e.g. ISO 8859-1.

What really does a locale? Right, the same thing: it
provides a character mapping table, which describes
the interpretation of character codes.

Other intersting questions:

What, if the charset is explicitly defined within the
document and neither htdig, nor your current locale
won't match? - Good question...

What is the main focus of a search engine?

- Right: index and search web documents.
- False: index documents with assumptions where itself lives.
- False again: ignore the content and charsets of the document itself.

Finally it could be a better way to fully support
UTF-8 instead of 8-bit characters only...


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to