> >> So first, I think this is the wrong way to prioritize as a matter of >> defaults, but second, I think I've already made it clear that no >either/or >> choice is actually required. The "stranded string" approach does all >of what >> you want and more. The O(log n) factor issue is more than compensated >for by >> the improvement in D-cache and D-TLB utilization. > >I wasn't sure you'd be willing to accept the overhead of the built-in >string type being a type-class / capsule / interface / whatever, or >that you would be comfortable with the default string data type being >a more complicated structure (ropes, indexed strings, strings with >extents, etc). If they are, the emphasis on the representation working >well for all situations is less important to me. If you don't have any >reservations about the extra overhead from that abstraction (compared >to their C equivalents), then I don't imagine anyone will.
Another option is instead os say string and stringBuilder ( a .NET class to build strings efficiently using an internal array and a mutable array for the last which is useful for printf style formatting) you could have String and LargeString with string being a byte index lean and mean UTF8 and large string as discussed. LargeString -using a tree -easily supports custom indexes like lines - Mutable support for adding to last array. - Efficient backward compatibility with char/point index APIs. - The tree could support both all UTF16 , UCS-4 or mixed depending on a mode.. eg for memory conservation the default is mixed but for interop you can set the rep to UCS-2 or UCS-4. The internal lib could overload both where appropriate ( or maybe even a nasty hack on the types for pseudo no cost inheritance) . IMHO mutability is not really a big issue on large strings as these often justify a lock if it is required as long as the cheap string is used for messages etc. In such a case is adopted it may be worth considering the utf-8 string as a valuetype as most short types eg <8-16 chars (UTF-8 here so = 8-16 bytes) are probably quicker pass by value with no heap overhead probably around 24-32 chars/bytes is break even . Im not sure if the stack space and lib concerns here ( overloading or inheritance hack , casting ) are worth it but it would be very good for all those common short strings. Ben _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
