https://bz.apache.org/ooo/show_bug.cgi?id=128019
--- Comment #3 from [email protected] --- We have C string structs and C++ string wrapper classes around those, found in main/sal, in ASCII and "Unicode" (UTF-16) versions, with 2^32 chars max length. Another 2 are in main/tools, 2^16 chars max length, used by Calc, StarBasic, possibly more. Keeping max string length in a 16 bit instead of 32 bit length field probably saves a lot of space in spreadsheets with lots of cells; Excel also does this. Apart from being based on sal_Char / sal_Unicode instead of native C++ types, they contain many functions not found in C++ standard library strings, eg. conversion to/from integer and double, string tokenization, interning, comparison of Unicode strings against ASCII, etc. Given the move to UTF8-only languages lately (Go, Rust), and the UTF-8 everywhere manifesto (https://utf8everywhere.org), we could consider eliminating the UTF-16 strings, and using the ASCII strings as UTF-8. That would however require fixing all code to traverse code points instead of code units, something it probably does wrong already. -- You are receiving this mail because: You are the assignee for the issue.
