https://bz.apache.org/ooo/show_bug.cgi?id=128019

--- Comment #3 from [email protected] ---
We have C string structs and C++ string wrapper classes around those, found in
main/sal, in ASCII and "Unicode" (UTF-16) versions, with 2^32 chars max length.

Another 2 are in main/tools, 2^16 chars max length, used by Calc, StarBasic,
possibly more. Keeping max string length in a 16 bit instead of 32 bit length
field probably saves a lot of space in spreadsheets with lots of cells; Excel
also does this.

Apart from being based on sal_Char / sal_Unicode instead of native C++ types,
they contain many functions not found in C++ standard library strings, eg.
conversion to/from integer and double, string tokenization, interning,
comparison of Unicode strings against ASCII, etc.

Given the move to UTF8-only languages lately (Go, Rust), and the UTF-8
everywhere manifesto (https://utf8everywhere.org), we could consider
eliminating the UTF-16 strings, and using the ASCII strings as UTF-8. That
would however require fixing all code to traverse code points instead of code
units, something it probably does wrong already.

-- 
You are receiving this mail because:
You are the assignee for the issue.

Reply via email to