2010/10/14 Ben Kloosterman <[email protected]> > The main cons I see is besides the tree index/reference cost , each > substring would need a field (which may be aligned to 4-8 bytes) or char to > indicate the encoding and the higher initial / final parse overhead. >
Yes. That field is two bits and can be encoded in the low-order to bits of the relevant array reference. > Another biggy is adding a string of UTF-8 one bytes to a string of 2 > byte chars such operations would require a conversion each time..And this > would be common in foreign languages eg html and xml parsing. ( though > splitting would be cheap as it would often occur along natural lines) > Nope. The strands are deep-constant. Appending a string of UTF-8 bytes to a string of UTF-16 bytes is merely a matter of appending metadata. The content runs don't change at all. > I was referring to the standard lib agnostic issue William mentioned > which im not sure you are even pursuing , eg person A builds BitC with > USC-2 standard lib , person B builds it with UTF-8 then dropping such > DLL/lib/assemblies on the same machine will not work together. > I don't see a problem there, so long as the specification for [de]serialization is sufficient. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
