Re: D-ish way to work with strings?

Robert M. Münch via Digitalmars-d-learn Fri, 27 Dec 2019 04:25:32 -0800

On 2019-12-23 15:05:20 +0000, H. S. Teoh said:

On Sun, Dec 22, 2019 at 06:27:03PM +0100, Robert M. Münch viaDigitalmars-d-learn wrote:

Want to add I'm talking about unicode strings.


Wouldn't it make sense to handle everything as UTF-32 so that
iteration is simple because code-point = code-unit?

And later on, convert to UTF-16 or UTF-8 on demand?

[...]

Be careful that code point != "character" the way most people understand
the word "character".

I know. My point was that with UTF-8 code-points (not being acharacter) have different sizes. Which you need to take into account ifyou want to iterate by code-points.

The word you're looking for is "grapheme". Which, unfortunately, israther complex and very slow to handle in
Unicode. See std.uni.byGrapheme.

Yes, that's when we come to "characters". And a "grapheme" can consistsof several code-points. Is grapheme handling just slow in D or ingeneral? If it's the latter, well, than that's just how it is.

Usually you want to just stick with UTF-8 (usually) or UTF-16 (for
Windows and Java interop). UTF-32 wastes a lot of space, and *still*
doesn't give you what you think you want, and Grapheme[] is just dog
slow because of the amount of decoding/recoding needed to manipulate it.


I need to handle graphemes when things are goind to be rendered and edited.

What are you planning to do with your strings?

Pretty simple: Have user editable content that is rendered usingdifferent fonts supporting unicode.

So, all editing functions: insert, replace, delete at all locations inthe string supporting all unicode characters.


Viele Grüsse.

--
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

Re: D-ish way to work with strings?

Reply via email to