Re: Test search engine

Craig Small Tue, 12 Sep 2000 16:09:13 -0500

On Tue, Sep 12, 2000 at 05:44:37PM +0900, NOKUBI Takatsugu wrote:
> I did not check umdsearch, however, it should need "word segmentation"
> process for some languages (like Japanese).
> There are no space between words in some languages. Therefore a
> boundary of words is not clear in such languages.
> 
> kakasi and chasen can segment Japanese words. I don't know about other
> languages...
The way it determines a word is that it has a character list that makes
up a word, something like [A-Za-z0-9]. A word is a sequence of
characters of: not-in-list, in-list, not-in-list


So is there is some byte sequence that equates to a space, then you make
sure it is not in the character list and udmsearch says that's where the
word ends.

I tried at least one of those mentioned search engines, it printed all
the errors in (I assume) Japanese.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/        <[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]>                 Debian developer <[EMAIL PROTECTED]>

Re: Test search engine

Reply via email to