Re: Thin UTF8 string wrapper

Jonathan Marler via Digitalmars-d-learn Fri, 06 Dec 2019 09:45:51 -0800

On Friday, 6 December 2019 at 16:48:21 UTC, Joseph RushtonWakeling wrote:

Hello folks,
I have a use-case that involves wanting to create a thin structwrapper of underlying string data (the idea is to have a typethat guarantees that the string has certain desirableproperties).
The string is required to be valid UTF-8. The question is whatthe most useful API is to expose from the wrapper: a sliceablerandom-access range? A getter plus `alias this` to just treatit like a normal string from the reader's point of view?
One factor that I'm not sure how to address w.r.t. a full rangeAPI is how to handle iterating over elements: presumably theyshould be iterated over as `dchar`, but how to implement a`front` given that `std.encoding` gives no way to decode theinitial element of the string that doesn't also pop it off thefront?
I'm also slightly disturbed to see that`std.encoding.codePoints` requires `immutable(char)[]` input:surely it should operate on any range of `char`?
I'm inclining towards the "getter + `alias this`" approach, butI thought I'd throw the problem out here to see if anyone hasany good experience and/or advice.
Thanks in advance for any thoughts!

All the best,

     -- Joe

Good questions. I don't have answers to them all but I hope thisinformation is helpful.

I use wrapper structs to represent properties in this way aswell. For example my "mar" library has the SentinelPtr andSentinelArray types which guarantee that the underlying pointerand/or array is terminted by some value (i.e. like anull-terminated C string).

If I'm creating and use these wrapper types inside aself-contained program then I don't really care about APIcompatibility so I would use a simple powerful mechanism like"alias this". For libraries where the API boundary is importantI implement the most limited API I can. The reason for this, isit allows you to see all possible interaction with the type.This way, when you need to change the API you know all theexisting ways it can be interacted with and iterate on the APIdesign appropriately. This is the case for SentinelPtr andSentinelArray. For this case I only implement the operations Iknow are being used, and I made this easy by creating a simplemodule I call "wrap.d"(https://github.com/dragon-lang/mar/blob/master/src/mar/wrap.d).

If you have a struct that wraps a string and guarantees it's UTF8encoded, wrap.d lets you declare that it's a wrapper type andallows you to mixin the operations you want to expose like this:


struct Utf8String
{
    private string str;
    import mar.wrap;

// this verifies the size of the wrapper struct and theunderlying field// are the same, and creates the wrappedValueRef method thatthe other

    // wrapper mixins use to access the underlying wrapped value
    mixin WrapperFor!"str";

    // Now you can mixin different operations, for example
    mixin WrapOpCast;
    mixin WrapOpIndex;
    mixin WrapOpSlice;
}

On the topic of immutable(char)[] vs const(char)[]. If a functiontakes const data, I take it to mean that the function won'tchange the data. If it takes immutable data, I take it to meanthat the function won't change it AND the caller must ensure datawon't change while the function has it. However in practice,functions that require immutable data sill declare their data be"const" instead of "immutable". I think this is becausedeclaring it as immutable would require extra boiler-plate allover your code to cast data to immutable all the time. So mostfunctions end up using const even though they require immutable.

Re: Thin UTF8 string wrapper

Reply via email to