To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=41792
                  Issue #:|41792
                  Summary:|Incorrect Handling of Surrogate Pair
                Component:|gsl
                  Version:|OOo 1.1.2
                 Platform:|PC
                      URL:|
               OS/Version:|Linux
                   Status:|UNCONFIRMED
        Status whiteboard:|
                 Keywords:|
               Resolution:|
               Issue type:|DEFECT
                 Priority:|P3
             Subcomponent:|code
              Assigned to:|cp
              Reported by:|xieqian





------- Additional comments from [EMAIL PROTECTED] Tue Feb  1 03:46:00 -0800 
2005 -------
This bug is related to issue 40391. When testing non-BMP character support in 
OpenOffice.org 1.1.2 for Linux coming with Fedora Core 3, I found problems 
involving surrogate pair handling. Besides the display issue that mentioned in 
issue 40391, the internal processing is also problematic. The following is the 
description.

1.  Test method: Text used to test is <U00004E86> <U0002010F> <U00004E8C> 
<U0002011F>, all are Chinese characters, including two SIP characters. The 
sample text was hardcoded into a plain text file. The file is opened with 
oowriter, and normal editing operations, such as selecting, deleting, copy-
pasting, are performed.

2. Phenomena: The SIP characters cannot be displayed, whilst blank space is 
kept for each SIP character which occupying width of two characters, and 
actually can be operated as two characters. Although one cannot move caret into 
a SIP character using arrow keys, one can select parts of the character by 
means of mouse operation. Consequently, surrogate pair may lose integrity in 
deleting or copy-pasting operations. This can be observed by monitoring the 
internal form during communication. The target type is UTF8_STRING.
  When selecting all, got E4 BA 86 F0 A0 84 8F E4 BA 8C F0 A0 84 9F.
  When selecting first half of <U0002010F>, got noting.
  When selecting second half of <U0002010F>, got 3F, code value of '?'.
  When deleting first half of <U0002010F>, then selecting all, got E4 BA 86 3F 
E4 BA 8C F0 A0 84 9F.
  When deleting second half of <U0002010F>, then selecting all, got E4 BA 86 3F 
F0 A0 84 9F, indicating the partial surrogate is combining with a normal 
character.

3. Conclusion: There are some protections of surrogate pair to preserve the 
integrity during editing operations, but far from enough. Recognition of 
invalid surrogate character should be enhanced. More importantly, operating 
approach that may damage integrity of surrogate pair must be totally 
eliminated, which seems to require a profound evolvement of some OOo's 
fundamental facilities.

Although only one particular version of OOo is tested, I believe the problem 
exist in all versions. I am not a OOo developer, and not sure to whom should 
this issue assigned to. But I think filing such a bug report maybe do some help.

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to