On 15 October 2010 06:53, Jonathan S. Shapiro <[email protected]> wrote: > On Wed, Oct 13, 2010 at 4:31 PM, William Leslie > <[email protected]> wrote: >> >> I mean to say that the in-memory format should favour efficiency of >> iteration and slicing rather than space efficiency. Space efficient >> representations can be reserved for serialisation. UTF-8 is a >> fantastic wire format, and it's great on disk, but the space-saving >> advantages are less important once you are in-memory. > > So you're okay with reducing the D-cache and D-TLB performance on > large-scale programs, and therefore their overall performance, by a factor > of >4? That seems a bit over-purist to me.
I guess it was a bit short sighted. In particular, most of the string usage of a program is going to be short strings, and for short strings linear time complexity of the of the indexing and slicing operations is going to be pretty inconsequential. And if you're going to be doing some fairly index-heavy operations, or implementing a VM, and the string type is a type-class or interface, you can always write your own stream readers that convert to your preferred format before the strings become app-level objects and the built-in string libraries will be mostly none the wiser. > So first, I think this is the wrong way to prioritize as a matter of > defaults, but second, I think I've already made it clear that no either/or > choice is actually required. The "stranded string" approach does all of what > you want and more. The O(log n) factor issue is more than compensated for by > the improvement in D-cache and D-TLB utilization. I wasn't sure you'd be willing to accept the overhead of the built-in string type being a type-class / capsule / interface / whatever, or that you would be comfortable with the default string data type being a more complicated structure (ropes, indexed strings, strings with extents, etc). If they are, the emphasis on the representation working well for all situations is less important to me. If you don't have any reservations about the extra overhead from that abstraction (compared to their C equivalents), then I don't imagine anyone will. -- William Leslie _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
