Consider struct FS { char const *data; int len; }
which is layout compatible with RE2 StringPiece. Note the "char" type is most usual but in principle could be any T. FS are immutable. Construction: 1. From a NTBS (C char array) literal: * set the length with strlen 2. From a buffer, NTBS: *set the length with strlen * allocate array on Felix heap * copy buffer to heap 3. From a buffer, length given: * allocate on Felix heap * copy buffer to heap 4. From a NTBS array, transfer ownership: * calculate length with strlen * pass ownership to GC (length wrong but not matter) 5. From C array and length, transfer ownership: * pass ownership to GC 6. Substring of an FS: * set start pointer and length * with checking on bounds 7. Concatenation of FS: * make new heap object With this model, copying and substring operations are constant time and quite fast. Scanning is fast. Concatenation is slow. Mutation is not allowed at all. Building an FS character by character would be a disaster. Instead one should use a varray, darray, C++ string, vector, or other method first. <IMPLEMENTATION> if implemented in C++, these object will have to be passed by a pointer to FS. A Felix implementation, however, could copy the FS directly. This is because Felix currently provides no way to specify the offsets of pointers into arbitrary C++ objects. These IS a way to specify that a pointer is a pointer though. I think this might work for StringPiece because there is only one pointer at offset 0. It may be possible to use a Felix implementation and hack it into RE2 StringPiece with a cast since they'd be layout compatible. A StringPiece can also be cast into a plain char* for the same reason: the array comes first, before the length. Of course all this breaks strict aliasing rules but the casting gains a lot more than the compiler optimisations] </IMPLEMENTATION> Conversion from C++ string, as noted, requires a heap allocation. Conversion to C++ string probably requires the same. This is a bug in the design of C++ strings but that's the whole issue here. Now consider a rope to speed concatenation. Again, purely functional and immutable. Internally we would use a tree of FS. This reduces the time of concatenation to log N I think. RAlist is faster at O(1) but only if the pieces are stored backwards. Scanning characters sequentially would still be roughly constant time per character, since most operations would simply scan inside the FS arrays. however random access would be slow at log N since we have travel down the tree branches. For performance the branches wold have to be labelled with the total string length they cover and the tree would have to be balanced. OPTIMAL? to get optimal performance, the compiler really needs to see that for concatenation there is a whole string of temporaries. We would need linear typing for that. But there might be another way: the compiler DOES know when it has to create a temporary. However it cannot reorganise an operation without knowing its properties, in particular, if the operation is associative. The trick here might be for the compiler to convert a such a chain into a fold, and allow the user to specialise the fold. IN other words a + b + c + d + e ==> fold_left + a list (b,c,e) The RHS slower than the LHS as written (with a list fold) but could be much faster if (fold_left +) could be specialised: in that case we could count the string lengths, reserve enough space, and copy everything once into the new buffer. -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language