On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
I've been working on RCStr (endearingly pronounced "Our Sister"), D's up-and-coming reference counted string type. The goals are:

<Slightly off-topic>

RCStr may be an easier first step, but I think generic dynamic arrays are more interesting, because are more generally applicable and user types like move-only resources make them a more challenging problem to solve.

BTW, what happened to scope? Generally speaking, I'm not a fan of Rust, and I know that you think that D needs to differentiate, but I like their borrowing model for several reasons: a) while not 100% safe and quite verbose, it offers enough improvements over @safe D to make it a worthwhile upgrade, if you don't care about any other language features b) it's not that hard to grasp / almost natural for people familiar with C++11's copy (shared_ptr) and move (unique_ptr) semantics. 3) it's general enough that it can be applied to areas like iterator invalidation, thread synchronization and other logic bugs, like some third-party rust packages demonstrate.

I think that improving escape analysis with the scope attribute can go along way to shortening the gap between Rust and D in that area.

The other elephant(s) in the room are nested contexts like delegates, nested structs and some alias template parameter arguments. These are especially bad because the user has zero control over those GC allocations. Which makes some of D's key features unusable in @nogc contexts.
<End off-topic>


* Reference counted, shouldn't leak if all instances destroyed; even if not, use the GC as a last-resort reclamation mechanism.

* Entirely @safe.

* Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but also raw manipulation and custom encodings via RCStr!ubyte, RCStr!ushort etc.

* Support several views of the same string, e.g. given s of type RCStr!char, it can be iterated byte-wise, code point-wise, code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar etc.

* Support const and immutable qualifiers for the character type.

* Work well with const and immutable when they qualify the entire RCStr type.

* Fast: use the small string optimization and various other layout and algorithms to make it a good choice for high performance strings

RFC: what primitives should RCStr have?


Thanks,

Andrei

0) (Prerequisite) Composition/interaction with language features/user types - RCStr in nested contexts (alias template parameters, delegates, nested structs/classes), array of RCStr-s, RCStr as a struct/class member, RCStr passed as (const) ref parameter, etc. should correctly increase/decrease ref count. This is also a prerequisite for safe RefCounted!T. Action item: related compiler bugs should be prioritized. E.g. the RAII bug from Shachar Shemesh's lightning talk - http://forum.dlang.org/post/n8algm$qra$1...@digitalmars.com.
See also:
https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
(not everything in those lists is related but there are some nasty ones, like bad RVO codegen).

1) Safe slicing

2) shared overloads of member functions (e.g. for stuff like atomic incRef/decRef)

3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)

4) (Optional) Reserving (pre-allocating capacity) / shrinking. I labeled this feature request as optional, as it's not clear if RCStr is more like a container, or more like a slice/range.

5) Some sort of optimization for zero-terminated strings. Quite often one needs to interact with C APIs, which requires calling toStringz / toUTFz, which causes unnecessary allocations. It would be great if RCStr could efficiently handle this scenario.

6) !!! Not really a primitive, but we need to make sure that applying a chain of range transformations won't break ownership (e.g. leak or free prematurely).

7) Should be able to replace GC usage in transient ranges like e.g. File.byLine

8) Cheap initialization/assignment from string literals - should be roughly the same as either initializing a static character array (if the small string optimization is used) or just making it point to read-only memory in the data segment of the executable. It shouldn't try to write or free such memory. When initialized from a string literal, RCStr should also offer a null-terminating byte, provided that it points to the whole If one wants to assign a string literal by overwriting parts of the already allocated storage, std.algorithm.mutation.copy should be used instead.

There may be other important primitives which I haven't thought of, but generally we should try to leverage std.algorithm, std.range, std.string and std.uni for them, via UFCS.

----------

On a related note, I know that you want to use AffixAllocator for reference counting, and I think it's a great idea. I have one question, which wasn't answered during that discussion:

// Use a nightly build to compile
import core.thread : Thread, thread_joinAll;
import std.range : iota;
import std.experimental.allocator : makeArray;
import std.experimental.allocator.building_blocks.region : InSituRegion; import std.experimental.allocator.building_blocks.affix_allocator : AffixAllocator;

AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;

static assert (tlsAllocator.sizeof >= 4096);

import std.stdio;
void main()
{
    shared(int)[] myArray;

    foreach (i; 0 .. 100)
    {
        new Thread(
        {
            if (i != 0) return;

myArray = tlsAllocator.makeArray!(shared int)(100.iota); static assert(is(typeof(&tlsAllocator.prefix(myArray)) == shared(uint)*));
            writefln("At %x: %s", myArray.ptr, myArray);

        }).start();

        thread_joinAll();
    }

    writeln(myArray); // prints garbage!!!
}

So my question is: should it be possible to share thread-local data like this? IMO, the current allocator design opens a serious hole in the type system, because it allows using data allocated from another thread's thread-local storage. After the other thread exits, accessing memory allocated from it's TLS should not be possible, but https://github.com/dlang/phobos/pull/3991 clearly allows that.

One should be able to allocate shared memory only from shared allocators. And shared allocators must backed by shared parent allocators or shared underlying storage. In this case the Region allocator should be shared, and must be backed by shared memory, Mallocator, or something in that vein.

Reply via email to