Unicode---Give us all of it!
============================

Unicode encodes characters in a codespace that ranges from 0 to 0x10FFFF. Much of the OOo code base operates on UTF-16 code units that range from 0 to 0xFFFF:

- C/C++ code based on sal_Unicode.

- Java code based on Java char.

- UNO based on UNO CHAR.

It is obvious that a single UTF-16 code unit cannot represent all of Unicode. Thus, UTF-16 is designed in such a way that each Unicode character can be represented in UTF-16 as an ordered sequence of at most two code units: Characters in the ranges U+0000--D7FF and U+E000--FFFF are represented by a single UTF-16 code unit (of the respective numeric value). Characters in the range U+10000--10FFFF are represented by two UTF-16 code units, a high surrogate in the range 0xD800--DBFF followed by a low surrogate in the range 0xDC00--DFFF.

In turn, it should be obvious that treating single UTF-16 code units as representing Unicode characters does not work. However, since most actually used Unicode characters are in the range U+0000--FFFF (and can hence faithfully be represented by a single UTF-16 code unit), this problem is not apparent in all situations. This will gradually change as Unicode characters in the range U+10000--10FFFF are used more and more frequently, especially in East Asian locales. And this should be motivation to enhance OOo so that all parts of it work flawlessly with all of Unicode.

In Java 5, this problem has been addressed by augmenting functionality based on Java char single UTF-16 code units (e.g., String.charAt) with functionality based on Java int (0--0x10FFFF) Unicode encoded characters (e.g., String.codePointAt), and by using functionality based on java.lang.String UTF-16 code unit sequences. Similar solutions are needed for C/C++ code and UNO APIs.

A related problem is that Unicode combining character sequences like U+0041 LATIN CAPITAL LETTER A followed by U+20E3 COMBINING ENCLOSING KEYCAP shall be treated as single characters in certain applications. (For example, if you can specify the bullet symbol that shall preceed each list item you enter in a word process, combining character sequences could be useful choices for such a symbol.) This indicates that an application's concept of "character" is often best represented by a programming environment's concept of "string."


C/C++ Code
----------

The approach here has two parts:

Use sal_uInt32 to represent individual Unicode encoded characters and add any necessary base functionality to rtl::OUString (e.g., operating on the individual Unicode encoded characters represented by an instance of rtl::OUString).

Find all the places in the code that need to be adapted.


Java Code
---------

No Java code within OOo that would need to be adapted has been identified. (Any necessary adaption would be complicated by the fact that OOo shall be compatible with Java 1.3.1, so that much of the functionality offered by Java 5 would not be available.)


UNO APIs
--------

Replace (if unpublished) or supersede (if published) any API that uses CHAR with a corresponding API that uses STRING. Find attached a list of all occurences of CHAR within the API (types.rdb) of SRC680m193.


How to proceede
---------------

In a first step, I will try to identify and gather as many places in OOo that need to be adapted, but I need your help for that: IF YOU KNOW OF ANY PLACE IN OOo THAT NEEDS TO BE ADAPTED, PLEASE LET ME KNOW.

Once all places have been identified, we can see how to address the task of adapting them accordingly.


-Stephan
com/sun/star/accessibility/XAccessibleText: char getCharacter([in] long nIndex)
com/sun/star/awt/KeyEvent: char KeyChar
com/sun/star/awt/KeyStroke: char KeyChar
com/sun/star/awt/SimpleFontMetric: char FirstChar
com/sun/star/awt/SimpleFontMetric: char LastChar
com/sun/star/awt/XFont: sequence<short> getCharWidths([in] char nFirst, [in] 
char nLast)
com/sun/star/awt/XFont: short getCharWidth([in] char c)
com/sun/star/awt/XFont: void getKernPairs([out] sequence<char> Chars1, [out] 
sequence<char> Chars2, [out] sequence<short> Kerns)
com/sun/star/awt/XTextEditField: void setEchoChar([in] char cEcho)
com/sun/star/i18n/XExtendedInputSequenceChecker: long 
correctInputSequence([inout] string aText, [in] long nPos, [in] char 
cInputChar, [in] short nInputCheckMode)
com/sun/star/i18n/XExtendedTransliteration: char transliterateChar2Char([in] 
char cChar)
com/sun/star/i18n/XExtendedTransliteration: string 
transliterateChar2String([in] char cChar)
com/sun/star/i18n/XInputSequenceChecker: boolean checkInputSequence([in] string 
aText, [in] long nPos, [in] char cInputChar, [in] short nInputCheckMode)
com/sun/star/io/XDataInputStream: char readChar()
com/sun/star/io/XDataOutputStream: void writeChar([in] char Value)
com/sun/star/io/XTextInputStream: string readString([in] sequence<char> 
Delimiters, [in] boolean bRemoveDelimiter)
com/sun/star/style/TabStop: char DecimalChar
com/sun/star/style/TabStop: char FillChar
com/sun/star/test/bridge/TestSimple: char Char
com/sun/star/test/bridge/XBridgeTest2: sequence<char> setSequenceChar([in] 
sequence<char> aSeq)
com/sun/star/test/bridge/XBridgeTest2: void setSequencesInOut([inout] 
sequence<boolean> aSeqBoolean, [inout] sequence<char> aSeqChar, ...)
com/sun/star/test/bridge/XBridgeTest2: void setSequencesOut([out] 
sequence<boolean> aSeqBoolean, [out] sequence<char> aSeqChar, ...)
com/sun/star/test/bridge/XBridgeTestBase: [attribute] char Char
com/sun/star/test/bridge/XBridgeTestBase: com/sun/star/test/bridge/TestData 
getValues([out] boolean bBool, [out] char cChar, ...)
com/sun/star/test/bridge/XBridgeTestBase: com/sun/star/test/bridge/TestData 
setValues2([inout] boolean bBool, [inout] char cChar, ...)
com/sun/star/test/bridge/XBridgeTestBase: void setValues([in] boolean bBool, 
[in] char cChar, ...)
com/sun/star/test/performance/SimpleTypes: char Char
com/sun/star/text/TextSortDescriptor2: [property] char Delimiter
com/sun/star/text/TextSortDescriptor: [property] char Delimiter

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to