On UCS-2: Sure, nobody likes it, codepoints vs code units, blah blah. But if you disallow the poorly typed indexing operations we are talking about here, the user need be none the wiser, as far as I can see.
Given a strongly-typed index into a string, unless the encoding is UTF-32, you are going to need to do some logic to eg determine when you are dealing with surrogate pairs in UCS-2 or determine the length of the character in UTF-8. We seem to have agreed that the logic for doing so is cheap enough that it *may* be a worthwhile trade-off for the reduction in cache usage in common workloads, and that this is worth benchmarking. Where typesafe indexes (such as iterators in the non-vector case) are used, which seems to be possible in the examples mentioned (regexp search, substring), we are probably always talking about O(1) typical, so I don't know where either of you are taking this discussion. And of course, you will want to allocate the indexes on the stack as far as iteration goes, but given different instances of the String typeclass you don't know how large they will have to be. One way to deal with this is by making the iteration machinery special as in Go or common lisp. Is there some performance issue I'm not considering here? -- William Leslie _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
