[l10n-issues] [Issue 74034] Alternative BreakIterator_ ja based on morphological analysis

er Wed, 31 Jan 2007 02:35:04 -0800

To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=74034






------- Additional comments from [EMAIL PROTECTED] Wed Jan 31 02:35:03 -0800 
2007 -------
The first obstacle I see in using MeCab is that all availabe documentation seems
to be in Japanese only. Who would be going to maintain that? Second, as Maho
mentioned, we seem to need a dictionary.

Btw, the attached patch replaces the current implementation (and somehow
erroneously duplicates the BreakIterator_ko ctor/dtor). If we decided to use
MeCab I would prefer making it a configurable alternative instead, as long as we
don't know whether is suits our needs or works on every platform we support.

There also seems to be room for improvement in MeCab itself, ucstable.h comes
with three static encoding tables containing 64k short int entries each
consisting mostly of 0x0000, this makes up 384k of nearly wasted memory.. Maybe
we could reuse our own textencoding converters intead, or make use of the
MECAB_USE_UTF8_ONLY that wouldn't need these tables. Which raises another
question: why is this back-and-forth conversion between UCS2 and EUC_JP
(respectively maybe UTF-8) needed at all if MeCab internally uses UCS2 anyway?
It seems it is lacking an interface for UCS2.

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-issues] [Issue 74034] Alternative BreakIterator_ ja based on morphological analysis

Reply via email to