Corrections 1)
I stated .NET uses UCS-2 but it uses UTF-16 ( never realized all those indexes would take O(n) to find the position) Also Windows is converting the internals from UCS-2 to UTF-16 and has been since Windows 2000. Perl uses UTF8 Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0. All these schemes use O(n) indexing. I see no one who does what I proposed of byte offset O(1) indexes ( internal to the array) and only have char index from the ToArray method ( except inside the lib ). 2) I was also under the impression that BitC offered C style mutable strings. So when I suggested removing index from string and convert to array that was what I meant. Anyway the only viable options available are basically UCS-2 which offers O(1) indexing and finds but cant represent most Asian chars requiring non standard encoding upon the internal string representation and takes 2 bytes storage per character. UTF-8 With O(n) indexing which allows the developer to refer to the character. Note on x86 you can use a fast SSE2 0x10 bit pattern scan to count characters quicker. UTF-8 with O(1) byte indexing with more runtime method focus and ToFixedCharArray methods for char indexing. If we go with the 70% DB is string figures ( and I would say objects are the same , at least for business objects as they map to the same ) then 1 Gig of objects or DB in UTF-8 would be 1.7 Gig in UCS-2 and 3.1 Gig in UCS-4. Ben _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
