To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=42660 Issue #:|42660 Summary:|Need feature for easy manual override of incorrect |word breaking Component:|l10n Version:|680m74 Platform:|All URL:| OS/Version:|All Status:|UNCONFIRMED Status whiteboard:| Keywords:| Resolution:| Issue type:|FEATURE Priority:|P3 Subcomponent:|code Assigned to:|ft Reported by:|samphan
------- Additional comments from [EMAIL PROTECTED] Sat Feb 12 21:19:33 -0800 2005 ------- Algorithms for finding word-breaks in Thai text are not 100% accurate. The dictionary-based algorithm used by OOo's ICU based line-breaker gives poor results when text contains words not in the dictionary, which can easily happen, for example, with new words or with words that are transliterations of English words. Although this can to some extent be alleviated with better algorithms or better dictionaries, no algorithm is likely to be 100% accurate in the foreseeable future. It is therefore important for there to be a easy way for users to manually override the word-breaks that are found automatically. Two characters in the Unicode are designed for this - "zero-width space" (ZWSP : U+200B) and "word joiner" (U+2060). 8<-- From Unicode 4.0 - Chapter 15 -->8 Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, or Japanese. When text is justified, ZWSP has no effect on letter spacingâfor example, in English or Japanese usage. Word Joiner. U+2060 WORD JOINER behaves like U+00A0 NO-BREAK SPACE in that it indicates the absence of word boundaries; however, the word joiner has no width. The function of the character is to indicate that line breaks are not allowed between the adjoining characters, except next to hard line breaks. 8<----------------------------->8 So the users should be able to put a ZWSP to add a breakable position and a WJ to prevent break at a position. I think ICU should already handle this two characters. However, users need some way to input the two Unicode characters into the document. For example:- Ctrl-space = Non-breaking space (normal OOo shortcut key) Shift-space = Zero-width space Ctrl-shift-space = Word joiner And this will allow the users to easily adjusting where the word-breaker break lines, whatever lanugage the text is. --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
