On Tuesday, 11 January 2022 at 12:22:36 UTC, WebFreak001 wrote:
[snip]
you can relatively easily find out how many bytes a string
takes up with `std.utf`. You can also iterate by code points or
graphemes there if you want to translate some kind of character
index to byte position.
HOWEVER it's not clear what a character is. Sure for the posted
cases here it's no problem but when it comes to languages based
on combining glyphs together to form new glyphs it's no longer
clear what is a character. There are Graphemes (grapheme
clusters) which are probably the closest to what everybody
would think a character is, but IIRC there are edge cases with
that a programmer wouldn't expect, like adding a character not
increasing the count of characters of the string because it
merges with the last Grapheme. Additionally there is a
performance impact on using Graphemes over simpler things like
codepoints which fit 98% of use-cases with strings. Codepoints
in D are mapped 1:1 using dchar, take up to 2 wchars or up to 4
chars. You can use `std.utf` to compute byte lengths for a
codepoint given a string.
aha, i think i might have miscommunicated here - i was talking
about an error i thought i was having where a fixedstring of
`"áéíóú"` wasn't equal to a string literal of the same, but as it
turned out i was misreading the error message [i had been trying
to assign a literal larger than the fixedstring could take]. to
tell the truth, unicode awareness is... not something i really
want to mess with right now, lol. it would be nice to have the
option at some point in the future though.
I would rather suggest you support FixedString with types other
than `char`. (wchar, dchar, heck users could even use any
arbitrary type and use this as array class) For languages that
commonly use more than 1 byte per codepoint or for interop with
Win32 unicode APIs, JavaScript strings, C# strings, UTF16 files
in general, etc. programmers might opt to use FixedString with
wchar then.
With D's templates that should be quite easy to do (add a
template parameter to the struct like `struct
FixedString(size_t maxSize, CharT = char)` and replace all
usage of char in your code with `CharT` in this case)
[i've pushed an update to the repo for
this!](https://github.com/Moth-Tolias/fixedstring/releases/tag/v1.1.0) =] it was a bit more complicated than a simple replace all, but not too hard.