Alexander Bokovoy wrote:

> I meant they lack needed functionality, this is different from 'lack
> i18n support'. The search engines (I mean, free search engines, of
> course) other than UdmSearch are unfamiliar with texts in one language
> but encoded by different existing encodings. It is very common situation
> with Cyrillic and Asian languages. Moreover, for Cyrillic we have 21
> different encoding (either 7-bit or 8-bit), though on the Internet only
> 4-6 are in actual use. Search engine should detect correct encoding,
> re-encode pages into one encoding, and index it only after that. HtDig
> lacks such support out of the box and known 'russification' patches
> nothing to do with several encodings. The same for Asian languages
> (except that they have three or four multi-byte encodings). Other search
> engines I saw either has no idea how to work with something like
> non-Latin 1 either has wrong support of several other options like
> sorting according national alphabet.

Ah. Another fine mess I got myself into, it seems. The patch is updated,
again, but I'd like those interested to contact me directly. Searching
is not something that is supported now so I won't keep bothering the
list with it unless I have relevant questions or announcements.

If anyone knows an open source indexing/searching engine that does
support
the above features, please let me know.

How does one encode multibyte chars, anyway? Currently I feed the
indexer
simple HTML pages that I build from the articles/topics/pages, but
the HTML I add to help the indexer is simple ASCII, so if the Midgard
content is encoded otherwise the results will be of mixed form.

Bye,
Emile

--
This is The Midgard Project's mailing list. For more information,
please visit the project's web site at http://www.midgard-project.org

To unsubscribe the list, send an empty email message to address
[EMAIL PROTECTED]

Reply via email to