On Tue, Mar 20, 2018 at 11:00 AM, smaug <sm...@welho.com> wrote:
> On 03/19/2018 09:30 PM, Kris Maglione wrote:
>>
>> On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote:
>>>
>>> It appears that currently we allow atomicizing invalid UTF-16 string,
>>> which are impossible to look up by UTF-8 key and we don't allow
>>> atomicizing invalid UTF-8.
>>>
>>> I need to change some things in this area in response to changing
>>> error handling of UTF-8 to UTF-16 XPCOM string conversions to be more
>>> secure, so I want to check if I should change things a bit more.
>>>
>>> I can well imagine that the current state is exactly what we want:
>>> Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't
>>> normally reach the atom machinery.
>>>
>>> Am I correct in assuming we don't want changes here?
>>>
>>> (One imaginable change would be replacing invalid sequences in both
>>> UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.)
>>
>>
>> Leaving aside the question of whether validation is desirable, I'd worry
>> about the performance impact. We atomize UTF-16 strings all over the place
>> in DOM code (and even have a main-thread pseudo-hashtable optimization for
>> them). Validation might be relatively cheap, but I'd still expect that
>> relative cheapness to add up fairly quickly.
>
>
> Yeah, all the atom handling is very hot code.  Unless there is some actual
> serious bug to fix, I wouldn't
> change the handling.

OK. I'll leave the UTF-16 case unchanged and will make the minimal
changes on the UTF-8 side to retain the existing outward behavior
without burning the tree. Hopefully I can make the UTF-8 case faster
while at it. It depended on not-so-great code.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to