On Sat, Mar 9, 2013 at 12:48 PM, Luis de Bethencourt <l...@debethencourt.com> wrote: > On Mar 7, 2013 10:37 PM, "Brady Eidson" <beid...@apple.com> wrote: >> > On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff <msab...@apple.com> >> > wrote: >> >> The various tokenizers / lexers work various ways to handle LChar >> >> versus UChar input streams. Most of the other tokenizers are templatized >> >> on >> >> input character type. In the case of HTML, the tokenizer handles a UChar >> >> character at a time. For 8 bit input streams, the zero extension of a >> >> LChar >> >> to a UChar is zero cost. There may be additional performance to be gained >> >> by doing all other possible handling in 8 bits, but an 8 bit stream can >> >> still contain escapes that need a UChar representation as you point out. >> >> Using a character type template approach was deemed to be too unwieldy for >> >> the HTML tokenizer. The HTML tokenizer uses SegmentedString's that can >> >> consist of sub strings with either LChar and UChar. That is where the >> >> LChar >> >> to UChar zero extension happens for an 8 bit sub string. >> >> >> >> My research showed that at the time showed that there were very few >> >> UTF-16 only resources (<<5% IIRC), although I expect the number to grow. >> >> On Mar 7, 2013, at 2:16 PM, Adam Barth <aba...@webkit.org> wrote: >> > Yes, I understand how the HTML tokenizer works. :) >> >> I didn't understand these details, and I really appreciate Michael >> describing them. I'm also glad others on the mailing list had an >> opportunity to get something out of this. > > I agree with Brady. I got some interesting learning out of this thread. > Always nice to read explanations and documentation about how things work. > Valuable content.
In retrospect, I think what I was reacting to was msaboff statement that an unnamed group of people had decided that the HTML tokenizer was too unwieldy to have a dedicated 8-bit path. In particular, it's unclear to me who made that decision. I certainly do not consider the matter decided. Adam _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev