On Wed, Oct 13, 2010 at 3:48 AM, Ben Kloosterman <[email protected]> wrote:
> Corrections > > 1) > > I stated .NET uses UCS-2 but it uses UTF-16 ( never realized all those > indexes would take O(n) to find the position) > Actually not. It uses the same misbegotten encoding that Java uses - I don't recall the name. Note that the .NET "char" is defined as 16 bits. Extended code points can * only* be encoded using strings, and .NET string indexing is defined to operate w.r.t. 16-bit units (and therefore isn't robust on extended characters). What happens in practice is that people carefully segregate strings that might contain extended characters. > 2) I was also under the impression that BitC offered C style mutable > strings. So when I suggested removing index from string and convert to > array > that was what I meant. > BitC does not provide mutable strings. > UCS-2 which offers O(1) indexing and finds but cant represent most Asian > chars requiring non standard encoding upon the internal string > representation and takes 2 bytes storage per character. > UTF-8 With O(n) indexing which allows the developer to refer to the > character. Note on x86 you can use a fast SSE2 0x10 bit pattern scan to > count characters quicker. > UTF-8 with O(1) byte indexing with more runtime method focus and > ToFixedCharArray methods for char indexing. > Good list, but incomplete. Ropes with O(log(n)) indexing work just find in practice. Also worth noting that indexing is almost *never* random. The get next/previous operation speed is much more important that finding the initial location. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
