To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=74034
------- Additional comments from [EMAIL PROTECTED] Wed Jan 31 02:35:03 -0800 2007 ------- The first obstacle I see in using MeCab is that all availabe documentation seems to be in Japanese only. Who would be going to maintain that? Second, as Maho mentioned, we seem to need a dictionary. Btw, the attached patch replaces the current implementation (and somehow erroneously duplicates the BreakIterator_ko ctor/dtor). If we decided to use MeCab I would prefer making it a configurable alternative instead, as long as we don't know whether is suits our needs or works on every platform we support. There also seems to be room for improvement in MeCab itself, ucstable.h comes with three static encoding tables containing 64k short int entries each consisting mostly of 0x0000, this makes up 384k of nearly wasted memory.. Maybe we could reuse our own textencoding converters intead, or make use of the MECAB_USE_UTF8_ONLY that wouldn't need these tables. Which raises another question: why is this back-and-forth conversion between UCS2 and EUC_JP (respectively maybe UTF-8) needed at all if MeCab internally uses UCS2 anyway? It seems it is lacking an interface for UCS2. --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
