Re: [bitc-dev] Unicode and bitc

wren ng thornton Thu, 14 Oct 2010 19:36:38 -0700

On 10/14/10 10:09 PM, Ben Kloosterman wrote:
> On 10/14/10 3:51 PM, Jonathan S. Shapiro wrote:
>> In futher practice, the number of strands tends to be small, so the
>> difference between O(log n) and O(1) is negligible.
>
> Im not sure this is true for example in all languages “<” . “>” point
> and numbers are ASCII. In chinese Y-M-D  is mixed chinese and ASCII numerics
> , in fact in nearly all languages you have UCS-2 codes but interspersed
> ASCII numbers and punctuation. So you would need some sort of complex
> encoding such that sequences of<  length n stay in the higher encoding form.
> This is also good because short strings would not need a tree and hence
> incur no cost.


This is definitely an issue with the proposal. But if it can be 
surmounted, I think the stranded-string proposal is a nice one--- 
certainly better than settling on any particular utf-N for everything. 
UTF-8 works well for European languages and about half the content of 
Asian languages, but that doesn't convince me that the other half of 
Asian languages should get screwed, or that utf-8 is the best internal 
representation in the world.

Solving this issue may take a bit of sufficient smartness however. If 
we're trying to avoid that, then the API should have ways of tweaking 
the behavior of when we switch encodings--- at the very least, it should 
have some way of saying when a (short) string should be forced to be a 
single strand, using whatever strand width is necessary. Working out the 
details of the rest of the API could be tricky though.

-- 
Live well,
~wren
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode and bitc

Reply via email to