Re: Full Unicode strings strawman

Mark Davis ☕ Wed, 18 May 2011 14:55:00 -0700

Yes, one of the options for the internal storage of the string class is to
use different arrays depending on the contents.


   1. uint8's if all the codepoint are <=FF
   2. uint16's if all the codepoint values <=FFFF
   3. uint32's otherwise

That way the internal storage always corresponds directly to the code point
index, which makes random access fast. Case #3 occurs rarely, so it is ok if
it takes more storage in that case.

Mark

*— Il meglio è l’inimico del bene —*


On Wed, May 18, 2011 at 14:46, Erik Corry <erik.co...@gmail.com> wrote:

> 2011/5/17 Wes Garland <w...@page.ca>:
> > If you're already storing UTF-8 strings internally, then you are already
> > doing something "expensive" (like copying) to get their code units into
> and
> > out of JS; so no incremental perf impact by not having a common UTF-16
> > backing store.
> >
> >>
> >> (As a note, Gecko and WebKit both use UTF-16 internally; I would be
> >> _really_ surprised if Trident does not.  No idea about Presto.)
> >
> > FWIW - last I time I scanned the v8 sources, it appeared to use a
> > three-representation class, which could store either ASCII, UCS2, or
> UTF-8.
> > Presumably ASCII could also be ISO-Latin-1, as both are exact, naive,
> > byte-sized UCS2/UTF-16 subsets.
>
> V8 has ASCII strings and UCS2 strings.  There are no Latin1 strings
> and UTF-8 is only used for IO, never for internal representation.
> WebKit uses UCS2 throughout and V8 is able to work directly on WebKit
> UCS2 strings that are on WebKit's C++ heap.
>
> I like Shawn Steele's suggestion.
>
> --
> Erik Corry
> _______________________________________________
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to