To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=41792 Issue #:|41792 Summary:|Incorrect Handling of Surrogate Pair Component:|gsl Version:|OOo 1.1.2 Platform:|PC URL:| OS/Version:|Linux Status:|UNCONFIRMED Status whiteboard:| Keywords:| Resolution:| Issue type:|DEFECT Priority:|P3 Subcomponent:|code Assigned to:|cp Reported by:|xieqian
------- Additional comments from [EMAIL PROTECTED] Tue Feb 1 03:46:00 -0800 2005 ------- This bug is related to issue 40391. When testing non-BMP character support in OpenOffice.org 1.1.2 for Linux coming with Fedora Core 3, I found problems involving surrogate pair handling. Besides the display issue that mentioned in issue 40391, the internal processing is also problematic. The following is the description. 1. Test method: Text used to test is <U00004E86> <U0002010F> <U00004E8C> <U0002011F>, all are Chinese characters, including two SIP characters. The sample text was hardcoded into a plain text file. The file is opened with oowriter, and normal editing operations, such as selecting, deleting, copy- pasting, are performed. 2. Phenomena: The SIP characters cannot be displayed, whilst blank space is kept for each SIP character which occupying width of two characters, and actually can be operated as two characters. Although one cannot move caret into a SIP character using arrow keys, one can select parts of the character by means of mouse operation. Consequently, surrogate pair may lose integrity in deleting or copy-pasting operations. This can be observed by monitoring the internal form during communication. The target type is UTF8_STRING. When selecting all, got E4 BA 86 F0 A0 84 8F E4 BA 8C F0 A0 84 9F. When selecting first half of <U0002010F>, got noting. When selecting second half of <U0002010F>, got 3F, code value of '?'. When deleting first half of <U0002010F>, then selecting all, got E4 BA 86 3F E4 BA 8C F0 A0 84 9F. When deleting second half of <U0002010F>, then selecting all, got E4 BA 86 3F F0 A0 84 9F, indicating the partial surrogate is combining with a normal character. 3. Conclusion: There are some protections of surrogate pair to preserve the integrity during editing operations, but far from enough. Recognition of invalid surrogate character should be enhanced. More importantly, operating approach that may damage integrity of surrogate pair must be totally eliminated, which seems to require a profound evolvement of some OOo's fundamental facilities. Although only one particular version of OOo is tested, I believe the problem exist in all versions. I am not a OOo developer, and not sure to whom should this issue assigned to. But I think filing such a bug report maybe do some help. --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
