Re: .NET on a string

Robert Jacques Tue, 17 Mar 2009 21:10:12 -0700

On Tue, 17 Mar 2009 13:03:56 -0400, Steven Schveighoffer<[email protected]> wrote:

On Tue, 17 Mar 2009 10:59:21 -0400, Georg Wrede <[email protected]>wrote:
Walter Bright wrote:
Cristi talks about adapting D strings to .NET.
http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?already_submitted=true
Actually, meant to ask earlier, but
"[...in .NET,] System.Array and System.ArraySegment are distinct,unrelated types. In D arrays slices are indistinguishable from arraysand this creates the problems that I mentioned in the interview."
Is this discussed somewhere?
The idea of slices and arrays being distinct types does seem to haveadvantages. I've seen a couple of mentions of this lately, but hasthere been a *rigorous* discussion?
There has been. But there are very good reasons to keep arrays andslices the same type. Even in C# and Java, a substring is the same typeas a string. It allows iterative patterns such as:
str = str[1..$];
The main problem comes form having a substring overwrite data in anotherstring via appending. From what I can tell, that is the *only* issuewith arrays and slices being the same type. This problem can be solvedin other ways. I've put forth 2 solutions that as far as I know werenot proven to be infeasible, but which never received any attention inthe NG from Walter or Andrei.
There was mention of a separate T[new] type, before my time with D, butI think you still have the same aliasing problems there unless you zeroout the source instance on assignment, or simply disallow assigning aT[new] from another T[new], which can affect the design of code, andmake certain iterative patterns difficult if not impossible.
The two solutions I put forth are:
1. Storing the requested length of an allocated block in the GC. Thisnot only allows for more intuitive appending, but could also help the GCwhen scanning for pointers (you might not have to zero out the blockfirst). This proposal received zero responses.

Well, if it helps, my roommate and I just independently came up with thisidea. (So is this great minds think a like or that simple ones seldomdiffer?)

2. Storing the requested length of an allocated array at the front ofthe array. I had a hackish scheme to use the most significant bit inthe length field to identify if a slice is just beyond the allocatedlength. Therefore, an append operation would first check if it is thefirst element in a block, and if so check the allocated length beforedeciding whether to append in place or allocate a new block. There weresome questions on the proposal, but I don't believe anyone found anykiller problems with it. It has advantages and disadvantages over thefirst proposal and current implementation. The main advantage is thatthe append code does not have to query the GC to get the requestedlength.

You mentioned being un-sure of the threading and alignment issues of yourproposal in [2]. I see both a lock and lock-free solution:


Lock solution:

As to append to an array, one has to know the capacity, which requirestaking the GC lock. If, as in [1], the used length and capacity are storedin the GC together, one could move the extension operation to the GC. i.e.given a block, the used length in that block and an array (consisting of apointer into that block and a length) it is straight forward to calculateif that array could be lengthen and if not, allocate a new one. The callercan then do a full or partial copy as appropriate. This avoids thealignment issues (as no new information is stored in the block) andmaintains the maximum array size (although this is a fairly minor cornercase)


Lock-free solution

This solution would make use of the 'empty' 16-byte block that proceedseach allocated block. I'm assuming this is usable, but if it's not, anextra 16-byte block would have to be added to each heap array to maintainproper alignment. This solution would cache the array capacity and itsused length just prior to the start of the array (i.e. in the 'empty'block). This is similar to [2], except that in [2] you still have to querythe GC for capacity. Again, a bit flag in the array length determinesslice / non-slice status. Regarding multi-threading issues: readingcapacity suffers from publication safety issues (I think) but it mightget cleared up by the releasing of the GC allocation lock. Otherwise, itwould need to be fenced. As for the length, one can use a single compareand swap (comparing the GC's length with the local array's length andsetting the new desired length). This also has the advantage of avoidingthe GC when doing array appending, which decreases the need for separateArrayBuldier classes. (Of course, once shared/local is introduced, theexpensive fences and CAS instructions can be limited appropriately.)

Aside from the append problem, I see no other drawbacks to havingstrings the way they are.
References:
Proposal 1:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146
Proposal 2:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437
-Steve

Re: .NET on a string

Reply via email to