On 16/03/2011 6:02 PM, Jonathan S. Shapiro wrote:
>     Also note most Replace and find have a start index to start from and
>     usage of this is very common on mid to large size strings.
> 
> Yes. This is the case where O(1) indexing is relevant - to find start of
> string. But it's also a case where you are committed to making a
> procedure call anyway, so front-loading the find/replace with a
> sufficiently brief traversal to find start of string may not be the end
> of the world.

It seems the desire to view strings as vectors is hampering the ability
to define an efficient, general API. IIRC, you've essentially concluded
that either:

1. flat/vector representations are not feasible given the character size
overheads necessary to represent Unicode.

2. non-vector implementations are undesirable because indexing is not O(1).

The engineering question is: which of these costs is more easily
amortized or elimited through judicious runtime assistance, type
hackery, or punting to the user?

Indexing seems to have solutions in defining an API where indexing is
discouraged in favour of iterators or "validated indexes" [1]. My
skimming sees a few proposals of this from C++.

For #1, we can perhaps amortize character size overheads by storing
string encoding in a header somehow. A string then becomes a sum:

type string = Uint8 of byte[] | UInt16 of uint16[] | UInt32 of uint32[]

A string concatenation operation widens string encoding to the larger of
the two strings. A substring operation can optionally compact the
representation by scanning the string, otherwise it leaves it as-is.
Perhaps also specialization can compact representations at compile time.
You get your strings-as-vectors view, with the same costs plus a few
branches, and it will cost you 2 bits in a header or pointer somewhere
to tag the cases.

I personally would favour #2 because strings are their own abstraction
separate from vectors. Indexing is a historical artifact of a certain
string API.

Sandro

[1] I'm thinking of something along the lines of "lightweight static
capabilities", http://lambda-the-ultimate.org/node/1635

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to