On Tue, Mar 20, 2018 at 11:00 AM, smaug <sm...@welho.com> wrote: > On 03/19/2018 09:30 PM, Kris Maglione wrote: >> >> On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote: >>> >>> It appears that currently we allow atomicizing invalid UTF-16 string, >>> which are impossible to look up by UTF-8 key and we don't allow >>> atomicizing invalid UTF-8. >>> >>> I need to change some things in this area in response to changing >>> error handling of UTF-8 to UTF-16 XPCOM string conversions to be more >>> secure, so I want to check if I should change things a bit more. >>> >>> I can well imagine that the current state is exactly what we want: >>> Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't >>> normally reach the atom machinery. >>> >>> Am I correct in assuming we don't want changes here? >>> >>> (One imaginable change would be replacing invalid sequences in both >>> UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.) >> >> >> Leaving aside the question of whether validation is desirable, I'd worry >> about the performance impact. We atomize UTF-16 strings all over the place >> in DOM code (and even have a main-thread pseudo-hashtable optimization for >> them). Validation might be relatively cheap, but I'd still expect that >> relative cheapness to add up fairly quickly. > > > Yeah, all the atom handling is very hot code. Unless there is some actual > serious bug to fix, I wouldn't > change the handling.
OK. I'll leave the UTF-16 case unchanged and will make the minimal changes on the UTF-8 side to retain the existing outward behavior without burning the tree. Hopefully I can make the UTF-8 case faster while at it. It depended on not-so-great code. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform