Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: Status of new search engine Date: Tue, 17 Dec 2002 20:46:36 +0900 (JST)
> I heard that "namazu" can be used for such purpose, i.e., constructing > a whole-text search engine for Japanese. It is a free software and > available as a Debian package. Namazu is very popular not only among > Japanese free software community but also among commercial usages. Sorry, this is not exact. Namazu is a search engine but it doesn't have sentence analyzer. It needs external softwares such as ChaSen http://chasen.aist-nara.ac.jp/index.html.en or Kakasi http://kakasi.namazu.org/index.html.en . If your search engine has some mechanism to expand word-separation procedure, you may want to ask someone about these softwares. I expect there are Debian members from Japan who know well on these softwares. There are several languages in the world whose sentence doesn't use whitespace to separate words. For example, "Thereareseverallanguages intheworldwhosesentencedoesn'tusewhitespacetoseparatewords." Among Debian webpage languages, Japanese and Chinese are such languages. Thai also, though it is not yet available on Debian webpage. Though Korean is similar to Japanese, modern Korean does use whitespace between words. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/

