Re: [bitc-dev] Unicode and bitc

Jonathan S. Shapiro Fri, 15 Oct 2010 00:09:42 -0700

2010/10/14 Ben Kloosterman <[email protected]>

>  The main cons I see is besides the tree index/reference cost , each
> substring would need a field (which may be aligned to 4-8 bytes) or char  to
> indicate the encoding and the higher initial / final parse overhead.
>


Yes. That field is two bits and can be encoded in the low-order to bits of
the relevant array reference.


>   Another biggy is adding a string of UTF-8 one bytes to a string of 2
> byte chars such operations would require a conversion each time..And this
> would be common in foreign languages  eg html and xml parsing.  ( though
> splitting would be cheap as it would often occur along natural lines)
>

Nope. The strands are deep-constant. Appending a string of UTF-8 bytes to a
string of UTF-16 bytes is merely a matter of appending metadata. The content
runs don't change at all.


>   I was referring to the standard lib agnostic issue William mentioned
> which im not sure you are even pursuing , eg  person A builds BitC with
> USC-2 standard lib , person B builds it with UTF-8 then dropping such
> DLL/lib/assemblies on the same machine will not work together.
>

I don't see a problem there, so long as the specification for
[de]serialization is sufficient.

shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode and bitc

Reply via email to