> From: "Arnaud Clère" <arnaud.cl...@minmaxmedical.com>
> > And I don't want to add QUtf8String until SG16's char8_t gets settled. 
> > It'll probably be settled by C++20, which means we can probably work on 
> > this during Qt 6 lifetime, possibly even 6.1 or 6.2.
> 
> It makes sense to avoid future incompatibilities with the standard but 
> fortunately Qt sometimes chooses to solve real problems ahead in time  ;-)

Well C++20 is really how many months away? Qt6 won't be released until when? It 
seems like both of these might land at the same time, except that the "by 
C++20" is (AFAICT) speculation. Uptake will also be slow. But by Qt being first 
we can get experience with the nature of the solution which might help inform 
the standard, or vice-versa. There's a risk we do something that conflicts with 
the standard in a useful way that people like, then we have fragmentation. 

Far smarter people than I have worked on this, so again burn this with fire, 
but my current thinking is: 
I think the problem is how all these things are implemented - they are 
basically escape codes, so it's impossible to say where thee current character 
ends and the next begins. This of course kills speed, but that's what we get 
for having more than one language on the planet plus emojis. It seems to me 
that the only real solution to keep it all fast is to progressively upgrade 
from bytes to the widest character and use that. This will have a scanning cost 
when it enters the address space if not denoted to the compiler or by the load 
method.  If memory is a concern, the only alternative I see is to create a 
complex string: "strings" are now arrays of character arrays of uniform width, 
and hope that it is only ever one:
"Ground control to Major Tom" - single sequence of 8 bit chars, len 27 size 27
"niños." encoded as 3 "strings", total length 6, size 7:
+ "ni" - "ni" (8 bit char sequence of 2 char)
+ "ñ" - 00000000 11110001 (UTF16 16 bit char sequence of 1 char)
+ "os." - "o" (8 bit char sequence of 3 char)

In the old days BASIC, I forget which one, but I'm remembering a Dr Dobbs or 
some other print medium (over 20 years ago), I read BASIC stores strings as a 
linked list of characters, I'm adapting that idea. There are many tradeoffs, 
but until we're ok with 32 bit characters, there will be tradeoffs on a 
multi-language planet. 

I just don't think escape codes should ever be stored in memory. Disk is fine. 

"Better to remain silent and be thought a fool than to speak and to remove all 
doubt." - (Disputed). I think I may have broken that rule here. "Please, be 
gentle." - Peter Venkman

_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to