Re: More fun with autodecoding

Chris via Digitalmars-d Mon, 10 Sep 2018 01:50:33 -0700

On Saturday, 8 September 2018 at 15:36:25 UTC, StevenSchveighoffer wrote:

On 8/9/18 2:44 AM, Walter Bright wrote:

So it turns out that technically the problem here, even thoughit seemed like an autodecoding problem, is a problem withsplitter.
splitter doesn't deal with encodings of character ranges at all.

For instance, when you have this:

"abc 123".byCodeUnit.splitter;
What happens is splitter only has one overload that takes oneparameter, and that requires a character *array*, not a range.
So the byCodeUnit result is aliased-this to its original, andsurprise! the elements from that splitter are string.
Next, I tried to use a parameter:

"abc 123".byCodeUnit.splitter(" ");
Nope, still devolves to string. It turns out it can't figureout how to split character ranges using a character array asinput.
The only thing that does seem to work is this:

"abc 123".byCodeUnit.splitter(" ".byCodeUnit);

After a while your code will be cluttered with absurd stuff likethis. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to myexperience with `splitter` et. al. I tried to create my ownparser to have better control over every step. After a few*minutes* of testing things I ran into this bug [1] that didn'tget fixed till early 2018. I never started to write my ownstep-by-step parser. I'm glad I didn't.

I wish people began to realize that string handling is a basicnecessity and that the correct handling of strings is of utmostimportance. Please keep us updated on how things work out (ornot) for you.

[Please, nobody answer my post pointing out that a) we don'tunderstand Unicode and b) that it's an insult to the Universe todraw attention to flaws that keep pestering us on an almost dailybasis - without trying to fix them ourselves stante pede. As isclear from Steve's efforts, the Universe doesn't seem to care.)


[1] https://issues.dlang.org/show_bug.cgi?id=16739

[snip]

Re: More fun with autodecoding

Reply via email to