Re: Handling arbitrary char ranges

Alex Parrill via Digitalmars-d-learn Wed, 20 Apr 2016 19:41:46 -0700

On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:

On 20.04.2016 23:59, Alex Parrill wrote:
On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:
[...]
First, you can't assign anything to a void[], for the samereason youcan't dereference a void*. This includes the slice assignmentthat youare trying to do in `buf[0..minLen] =remainingData[0..minLen];`.
Not true. You can assign any dynamic array to a void[].

That's not assigning the elements of a void[]; it's just changingwhat the slice points to and adjusting the length, like doing`void* ptr = someOtherPtr;`

Regarding vector notation, the spec doesn't seem to mention howit interacts with void[], but dmd accepts this no problem:
----
int[] i = [1, 2, 3];
auto v = new void[](3 * int.sizeof);
v[] = i[];
----

It only seems to work on arrays, not arbitrary ranges, sliceableor not. Though see below.

[...]
Second, don't use slicing on ranges (unless you need it). Notall ranges
support it...
As far as I see, the slicing code is guarded by `static if(isArray!T)`. Arrays support slicing.
[...]
Instead, use a loop (or maybe `put`) to fill the array.
That's what done in the `else` path, no?


Yes, I did not see the static if condition, my bad.

Third, don't treat text as bytes; encode your characters.

     auto schema = EncodingScheme.create("utf-8");

auto range = chain("hello", " ", "world").map!(ch =>cast(char) ch);


     auto buf = new ubyte[](100);
     auto currentPos = buf;
     while(!range.empty && schema.encodedLength(range.front) <=
currentPos.length) {
         auto written = schema.encode(range.front, currentPos);
         currentPos = currentPos[written..$];
         range.popFront();
     }
     buf = buf[0..buf.length - currentPos.length];

You're "converting" chars to UTF-8 here, right? That's a nop.char is a UTF-8 code unit already.


It can be either chars, wchars, or dchars.

(PS there ought to be a range in Phobos that encodes eachcharacter,
something like map maybe)
std.utf.byChar and friends:

https://dlang.org/phobos/std_utf.html#.byChar

byChar would work. byWChar and byDChar might cause endian-nessissues.

Re: Handling arbitrary char ranges

Reply via email to