On Tue 15 Mar 2011 23:49, Mark H Weaver <m...@netris.org> writes: >> Well, we covered O(1) vs O(n). To make UTF-8 O(1), you need to store >> additional indexing information of some sort. There are various schemes, >> but, depending the the scheme, you lose some of memory advantage of UTF-8 >> vs UTF-32. You can likely to better than UTF-32, though. > > I would prefer to either let our accessors be O(n), or else to create > the index lazily, i.e. on the first usage of string-ref or string-set! > In such a scheme, very few strings would include indices, and thus the > overhead would be minimal. > > Anyway, the index overhead can be made arbitrarily small by increasing > the chunk size. It is a classic time-space trade-off here. The chunk > size could be made larger over the years, as usage of string-ref and > string-set! become less common, and eventually the index stuff could be > removed entirely.
Though I agre that string-set! should be discouraged -- as Clinger also thought back in 1984, it seems -- string-ref is still important. The only thing that could replace it would be some sort of string cursor / iteration protocol, and I would prefer for that to be standard (SRFI or otherwise). So let's factor string-ref into the "costs" of a potential switch to UTF-8, be it in space or in time or whatever. Andy -- http://wingolog.org/