On 16/03/2011 6:02 PM, Jonathan S. Shapiro wrote: > Also note most Replace and find have a start index to start from and > usage of this is very common on mid to large size strings. > > Yes. This is the case where O(1) indexing is relevant - to find start of > string. But it's also a case where you are committed to making a > procedure call anyway, so front-loading the find/replace with a > sufficiently brief traversal to find start of string may not be the end > of the world.
It seems the desire to view strings as vectors is hampering the ability to define an efficient, general API. IIRC, you've essentially concluded that either: 1. flat/vector representations are not feasible given the character size overheads necessary to represent Unicode. 2. non-vector implementations are undesirable because indexing is not O(1). The engineering question is: which of these costs is more easily amortized or elimited through judicious runtime assistance, type hackery, or punting to the user? Indexing seems to have solutions in defining an API where indexing is discouraged in favour of iterators or "validated indexes" [1]. My skimming sees a few proposals of this from C++. For #1, we can perhaps amortize character size overheads by storing string encoding in a header somehow. A string then becomes a sum: type string = Uint8 of byte[] | UInt16 of uint16[] | UInt32 of uint32[] A string concatenation operation widens string encoding to the larger of the two strings. A substring operation can optionally compact the representation by scanning the string, otherwise it leaves it as-is. Perhaps also specialization can compact representations at compile time. You get your strings-as-vectors view, with the same costs plus a few branches, and it will cost you 2 bits in a header or pointer somewhere to tag the cases. I personally would favour #2 because strings are their own abstraction separate from vectors. Indexing is a historical artifact of a certain string API. Sandro [1] I'm thinking of something along the lines of "lightweight static capabilities", http://lambda-the-ultimate.org/node/1635 _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
