On Tue, Sep 12, 2000 at 05:44:37PM +0900, NOKUBI Takatsugu wrote: > I did not check umdsearch, however, it should need "word segmentation" > process for some languages (like Japanese). > There are no space between words in some languages. Therefore a > boundary of words is not clear in such languages. > > kakasi and chasen can segment Japanese words. I don't know about other > languages... The way it determines a word is that it has a character list that makes up a word, something like [A-Za-z0-9]. A word is a sequence of characters of: not-in-list, in-list, not-in-list
So is there is some byte sequence that equates to a space, then you make sure it is not in the character list and udmsearch says that's where the word ends. I tried at least one of those mentioned search engines, it printed all the errors in (I assume) Japanese. - Craig -- Craig Small VK2XLZ GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5 Eye-Net Consulting http://www.eye-net.com.au/ <[EMAIL PROTECTED]> MIEEE <[EMAIL PROTECTED]> Debian developer <[EMAIL PROTECTED]>

