On Sat, Oct 16, 2010 at 1:21 AM, Ben Kloosterman <[email protected]> wrote:
> 1) For adding strings isn’t it an issue if you add say 1 UCS-1 > then 1 UCS-4 then 1 UCS-1 due to the “tree overhead ? So you will need > to parse and re-encode ? Do you create strings from chars as UCS-4 in a a > string builder or will string builder change to UCS-4 when needed ? > Last night I thought so, but then I realized that a smarter encoding is possible. The strands are contiguous series of code units. The string object itself could be just a contiguous series of (offset, strand pointer) pairs. The code unit size of the strand is encoded in the pair somewhere. No actual tree is required. You bsearch the offsets and then index into the discovered strand. If you maintain a current offset marker within the string representation, you'll get O(1) most of the time. > 2) Let me confirm that this ,each strand is a separate heap object > right ? > It lives in the heap, but it is not a first-class object. > · UTF-8 with char indexes , O(n) scans so not suitable for > medium to long strings. Risk on Medium to long stream performance , > Unless you keep an associated side table of (offset, run-size) pairs. > · UTF-8 with binary index ,O(1) lookups , Issues with backward > compatible API / algorithms and developers needing to understand its bytes. > Not great for long strings due to large memory object. > Not sure why this creates a compatibility issue, since the index is internal to the runtime layer's implementation. > Risk on backward compatibility and developer incorrect usage > > o Use operator overloading on + , will require a typedef of the index > type and could remove some of the compatibility issues. > > o value type option . Will still require a reference type string eg > longstring so can only be considered if we do that. Usefull in cases where > you don’t have/want a heap > As I have said elsewhere, I strongly oppose a LongString/ShortString distinction. It cannot be used correctly. > · Strand use , benefits for large strings and GC , some concern > over small strings overhead and how it will work in practice eg aggregating > mixed strings how to create strings efficiently . Risk is for small > string performance and possibly creating strings. > I can think of several other options. The point is that all of these are a matter that is purely internal to the runtime. It isn't part of the specification how the string is internally implemented, and must not be. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
