>In futher practice, the number of strands tends to be small, so the difference between O(log n) and O(1) is negligible.
Im not sure this is true for example in all languages “<” . “>” point and numbers are ASCII. In chinese Y-M-D is mixed chinese and ASCII numerics , in fact in nearly all languages you have UCS-2 codes but interspersed ASCII numbers and punctuation. So you would need some sort of complex encoding such that sequences of < length n stay in the higher encoding form. This is also good because short strings would not need a tree and hence incur no cost. Interesting option that deserves more thought , I’m not sold on byte indexes with operator overloading either. It also has the pro of introducing line indexes trivially The main cons I see is besides the tree index/reference cost , each substring would need a field (which may be aligned to 4-8 bytes) or char to indicate the encoding and the higher initial / final parse overhead. Another biggy is adding a string of UTF-8 one bytes to a string of 2 byte chars such operations would require a conversion each time..And this would be common in foreign languages eg html and xml parsing. ( though splitting would be cheap as it would often occur along natural lines) <div> 关于支付宝 </div> My gut feel says this method is a bit too heavy unless byte indexes have too many issues , I think it is superior to C# , Java and likely to schemes using UTF-8 but char/point indexes. >>Lastly is it a good idea supporting multiple underlying schemes aside from legacy support methods like ToFixedCharArray() ? Java and .NET have >>survived without it and having single schemes helps interop. Eg a >a byte code file ( .NET assembly or windows dll) will work on any machine but with >>different possible internal storage schemes this would not be possible. >I think that's wrong. Reading a string from a bytecode file qualifies as serialization. All that is required is a normative byte code file format, and >that's got nothing to do with the internal string representation. I was referring to the standard lib agnostic issue William mentioned which im not sure you are even pursuing , eg person A builds BitC with USC-2 standard lib , person B builds it with UTF-8 then dropping such DLL/lib/assemblies on the same machine will not work together. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
