Yes, one of the options for the internal storage of the string class is to use different arrays depending on the contents.
1. uint8's if all the codepoint are <=FF 2. uint16's if all the codepoint values <=FFFF 3. uint32's otherwise That way the internal storage always corresponds directly to the code point index, which makes random access fast. Case #3 occurs rarely, so it is ok if it takes more storage in that case. Mark *— Il meglio è l’inimico del bene —* On Wed, May 18, 2011 at 14:46, Erik Corry <erik.co...@gmail.com> wrote: > 2011/5/17 Wes Garland <w...@page.ca>: > > If you're already storing UTF-8 strings internally, then you are already > > doing something "expensive" (like copying) to get their code units into > and > > out of JS; so no incremental perf impact by not having a common UTF-16 > > backing store. > > > >> > >> (As a note, Gecko and WebKit both use UTF-16 internally; I would be > >> _really_ surprised if Trident does not. No idea about Presto.) > > > > FWIW - last I time I scanned the v8 sources, it appeared to use a > > three-representation class, which could store either ASCII, UCS2, or > UTF-8. > > Presumably ASCII could also be ISO-Latin-1, as both are exact, naive, > > byte-sized UCS2/UTF-16 subsets. > > V8 has ASCII strings and UCS2 strings. There are no Latin1 strings > and UTF-8 is only used for IO, never for internal representation. > WebKit uses UCS2 throughout and V8 is able to work directly on WebKit > UCS2 strings that are on WebKit's C++ heap. > > I like Shawn Steele's suggestion. > > -- > Erik Corry > _______________________________________________ > es-discuss mailing list > es-discuss@mozilla.org > https://mail.mozilla.org/listinfo/es-discuss >
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss