https://bugs.freedesktop.org/show_bug.cgi?id=54843

             Bug #: 54843
           Summary: Bad righthyphenmin for 3-byte or more UTF-8 multibyte
                    characters
    Classification: Unclassified
           Product: LibreOffice
           Version: 3.7.0.0.alpha0+ Master
          Platform: Other
        OS/Version: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Linguistic
        AssignedTo: [email protected]
        ReportedBy: [email protected]


Created attachment 67077
  --> https://bugs.freedesktop.org/attachment.cgi?id=67077
Telugu test example

(From the bug report by Steven Dickson:)

There appears to be a logic error in the hnj_hyphen_rhmin function in the file
hyphen.c.  The function is supposed to remove hyphens from the right hand side
of a word based on the value of RIGHTHYPHENMIN defined in the hyphenation
pattern file for the language.  It works properly for words containing only
single-byte characters, but can fail if the word contains multi-byte
characters.


The code erroneously assumes that the last character of the word is a
single-byte character and starts scanning the word at the next to last byte of
the word.  This can be corrected by initializing the character count variable,
i, to 0 rather than 1 and starting the for loop with j = word_size – 1 rather
than j = word_size -2.


The code also erroneously increments the character count variable, i, while
still inside of a mult-byte character. This can be corrected by only
incrementing i when at the first byte of a multi-byte character (word[j] & 0xc0
== 0xc0) or when at a single-byte character (word[j] & 0x80 != 0x80).

A diff of hyphen.c with the corrections follows.

737c737

<     int i = 1;

---

>     int i = 0;

743c743

<     for (j = word_size - 2; i < rhmin && j > 0; j--) {

---

>     for (j = word_size - 1; i < rhmin && j > 0; j--) {

756c756

<        if (!utf8 || (word[j] & 0xc0) != 0xc0) i++;

---

>        if (!utf8 || (word[j] & 0xc0) == 0xc0 || (word[j] & 0x80) != 0x80) i++;

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to