[Felix-language] strings

john skaller Tue, 16 Sep 2014 16:13:47 -0700

Consider 

        struct FS { char const *data; int len; }


which is layout compatible with RE2 StringPiece.  Note
the "char" type is most usual but in principle could be any T.
FS are immutable.

Construction:

1. From a NTBS (C char array) literal: 
        * set the length with strlen

2. From a buffer, NTBS:
        *set the length with strlen
        * allocate array on Felix heap
        * copy buffer to heap
 
3. From a buffer, length given:
        * allocate on Felix heap
        * copy buffer to heap

4. From a NTBS array, transfer ownership:
        * calculate length with strlen
        * pass ownership to GC (length wrong but not matter)

5. From C array and length, transfer ownership:
        * pass ownership to GC

6. Substring of an FS:
        * set start pointer and length
        * with checking on bounds

7. Concatenation of FS:
        * make new heap object

With this model, copying and substring operations are 
constant time and quite fast. Scanning is fast.

Concatenation is slow. Mutation is not allowed at all.
Building an FS character by character would be a disaster.
Instead one should use a varray, darray, C++ string, vector,
or other method first.

<IMPLEMENTATION>

if implemented in C++, these object will have to be passed by
a pointer to FS. A Felix implementation, however, could copy
the FS directly. 

This is because Felix currently provides no
way to specify the offsets of pointers into arbitrary C++ objects.
These IS a way to specify that a pointer is a pointer though.
I think this might work for StringPiece because there is only
one pointer at offset 0.

It may be possible to use a Felix implementation and hack it into
RE2 StringPiece with a cast since they'd be layout compatible.
A StringPiece can also be cast into a plain char* for the same
reason: the array comes first, before the length. Of course
all this breaks strict aliasing rules but the casting gains a lot
more than the compiler optimisations]

</IMPLEMENTATION>

Conversion from C++ string, as noted, requires a heap allocation.
Conversion to C++ string probably requires the same. 
This is a bug in the design of C++ strings but that's the whole issue here.

Now consider a rope to speed concatenation.
Again, purely functional and immutable.

Internally we would use a tree of FS. This reduces the time
of concatenation to log N I think. RAlist is faster at O(1) but
only if the pieces are stored backwards.

Scanning characters sequentially would still be roughly
constant time per character, since most operations would simply
scan inside the FS arrays. however random access would be
slow at log N since we have travel down the tree branches.
For performance the branches wold have to be labelled with
the total string length they cover and the  tree would have to be
balanced.

OPTIMAL?

to get optimal performance, the compiler really needs to see that
for concatenation there is a whole string of temporaries. We would
need linear typing for that.

But there might be another way: the compiler DOES know when it
has to create a temporary. However it cannot reorganise an operation
without knowing its properties, in particular, if the operation is associative.
The trick here might be for the compiler to convert a such a chain
into a fold, and allow the user to specialise the fold. IN other words

        a + b + c + d + e ==> fold_left + a list (b,c,e)

The RHS slower than the LHS as written (with a list fold) but could
be much faster if (fold_left +) could be specialised: in that case
we could count the string lengths, reserve enough space, and copy
everything once into the new buffer.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] strings

Reply via email to