Re: [bitc-dev] Unicode and bitc

William Leslie Thu, 14 Oct 2010 17:58:28 -0700

On 15 October 2010 06:53, Jonathan S. Shapiro <[email protected]> wrote:
> On Wed, Oct 13, 2010 at 4:31 PM, William Leslie
> <[email protected]> wrote:
>>
>> I mean to say that the in-memory format should favour efficiency of
>> iteration and slicing rather than space efficiency. Space efficient
>> representations can be reserved for serialisation. UTF-8 is a
>> fantastic wire format, and it's great on disk, but the space-saving
>> advantages are less important once you are in-memory.
>
> So you're okay with reducing the D-cache and D-TLB performance on
> large-scale programs, and therefore their overall performance, by a factor
> of >4? That seems a bit over-purist to me.


I guess it was a bit short sighted. In particular, most of the string
usage of a program is going to be short strings, and for short strings
linear time complexity of the of the indexing and slicing operations
is going to be pretty inconsequential.

And if you're going to be doing some fairly index-heavy operations, or
implementing a VM, and the string type is a type-class or interface,
you can always write your own stream readers that convert to your
preferred format before the strings become app-level objects and the
built-in string libraries will be mostly none the wiser.

> So first, I think this is the wrong way to prioritize as a matter of
> defaults, but second, I think I've already made it clear that no either/or
> choice is actually required. The "stranded string" approach does all of what
> you want and more. The O(log n) factor issue is more than compensated for by
> the improvement in D-cache and D-TLB utilization.

I wasn't sure you'd be willing to accept the overhead of the built-in
string type being a type-class / capsule / interface / whatever, or
that you would be comfortable with the default string data type being
a more complicated structure (ropes, indexed strings, strings with
extents, etc). If they are, the emphasis on the representation working
well for all situations is less important to me. If you don't have any
reservations about the extra overhead from that abstraction (compared
to their C equivalents), then I don't imagine anyone will.

-- 
William Leslie
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode and bitc

Reply via email to