Re: Some questions about strings

Denis via Digitalmars-d-learn Sun, 21 Jun 2020 20:46:36 -0700

On Monday, 22 June 2020 at 03:24:37 UTC, Adam D. Ruppe wrote:

On Monday, 22 June 2020 at 03:17:54 UTC, Denis wrote:
- First, is there any difference between string, wstring anddstring?
Yes, they encode the same content differently in the bytes. Ifyou cast it to ubyte[] and print that out you can see thedifference.
- Are the characters of a string stored in memory by theirUnicode codepoint(s), as opposed to some other encoding?
no, they are encoded in utf-8, 16, or 32 for string, wstring,and dstring respectively.
- Can a series of codepoints, appropriately padded to therequired width, and terminated by a null character, bedirectly assigned to a string WITHOUT GOING THROUGH A DECODING/ ENCODING TRANSLATION?
no, they must be encoded. Unicode code points are an abstractconcept that must be encoded somehow to exist in memory(similar to the idea of a number).

OK, then that actually simplifies what's needed, because I won'tneed to decode the UTF-8, only validate it.

My code reads a UTF-8 encoded file into a buffer and validates,byte by byte, the UTF-8 encoding along with some additionalvalidation. If I simply return the UTF-8 encoded string, therewon't be another decoding/encoding done -- correct?

Re: Some questions about strings

Reply via email to