Emile Heyns wrote:
>
> Alexander Bokovoy wrote:
>
> > support and omit it when they not needed. But, again, supporting
> > Midgard-based servers at the search engine core will be preferred
> > solution.
>
> I'll fiddle with it some more to produce some actual output, then I'll
> talk
> to them. An updated patch, which adds a --with-midgard flag to configure
> is at
> http://www.iris-advies.com/php/Emile.Heyns/udmsearch-2.2.1-midgard.patch
Ok, I'll look at it.
> > None of other search engines has i18n support out of the box with the
> > functionality needed to support languages with several encodings (like
> > Russian or Japanese) no matter one-bye or multi-byte.
>
> Bummer. None at all? Anyway, the search system for midgard shall be
> independant
> of the actual search engine used. The indexer will need per-search
> engine
> changes, of course. I'll work with udmsearch for the while.
I meant they lack needed functionality, this is different from 'lack
i18n support'. The search engines (I mean, free search engines, of
course) other than UdmSearch are unfamiliar with texts in one language
but encoded by different existing encodings. It is very common situation
with Cyrillic and Asian languages. Moreover, for Cyrillic we have 21
different encoding (either 7-bit or 8-bit), though on the Internet only
4-6 are in actual use. Search engine should detect correct encoding,
re-encode pages into one encoding, and index it only after that. HtDig
lacks such support out of the box and known 'russification' patches
nothing to do with several encodings. The same for Asian languages
(except that they have three or four multi-byte encodings). Other search
engines I saw either has no idea how to work with something like
non-Latin 1 either has wrong support of several other options like
sorting according national alphabet.
In brief, this is a very big problem and another long-discussed story.
Also, this problem always has been very close to Apache i18n dicussions
which, in fact, resulted into mod_charset module with very flexible
interface either in configuration files or internal Apache (and PHP)
API.
--
Sincerely yours,
Alexander Bokovoy
<!-- 2:450/144.58 --- bokovoyATminsk.lug.net --- FractalsAtTheEdge -->
--
This is The Midgard Project's mailing list. For more information,
please visit the project's web site at http://www.midgard-project.org
To unsubscribe the list, send an empty email message to address
[EMAIL PROTECTED]