To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=42660
                  Issue #:|42660
                  Summary:|Need feature for easy manual override of incorrect
                          |word breaking
                Component:|l10n
                  Version:|680m74
                 Platform:|All
                      URL:|
               OS/Version:|All
                   Status:|UNCONFIRMED
        Status whiteboard:|
                 Keywords:|
               Resolution:|
               Issue type:|FEATURE
                 Priority:|P3
             Subcomponent:|code
              Assigned to:|ft
              Reported by:|samphan





------- Additional comments from [EMAIL PROTECTED] Sat Feb 12 21:19:33 -0800 
2005 -------
Algorithms for finding word-breaks in Thai text are not 100% accurate. The
dictionary-based algorithm used by OOo's ICU based line-breaker gives poor
results when text contains words not in the dictionary, which can easily happen,
for example, with new words or with words that are
transliterations of English words.  Although this can to some extent be
alleviated with better algorithms or better dictionaries, no algorithm is likely
to be 100% accurate in the foreseeable future. It is therefore important for
there to be a easy way for users to manually override the
word-breaks that are found automatically.

Two characters in the Unicode are designed for this - "zero-width space" (ZWSP :
U+200B)  and "word joiner" (U+2060).  

8<-- From Unicode 4.0 - Chapter 15 -->8

Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except 
that
it has no width. Zero-width space characters are intended to be used in
languages that have
no visible word spacing to represent word breaks, such as Thai, Khmer, or
Japanese. When
text is justified, ZWSP has no effect on letter spacingâfor example, in 
English
or Japanese
usage.

Word Joiner. U+2060 WORD JOINER behaves like U+00A0 NO-BREAK SPACE in that it
indicates the absence of word boundaries; however, the word joiner has no width.
The function
of the character is to indicate that line breaks are not allowed between the
adjoining characters,
except next to hard line breaks.
8<----------------------------->8

So the users should be able to put a ZWSP to add a breakable position and a WJ
to prevent break at a position. I think ICU should already handle this two
characters. However, users need some way to input the two Unicode characters
into the document. For example:-

Ctrl-space = Non-breaking space (normal OOo shortcut key)
Shift-space = Zero-width space
Ctrl-shift-space = Word joiner

And this will allow the users to easily adjusting where the word-breaker break
lines, whatever lanugage the text is.

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to