I find intriguating that the update intends to enforce the decoding of the **shortest** sequences, but now wants to treat **maximal sequences** as a single unit with arbitrary length. UTF-8 was designed to work only with some state machines that would NEVER need to parse more than 4 bytes.
For me, as soon as the first byte encountered is invalid, the current sequence should be stopped there and treated as error (replaced by U+FFFD is replacement is enabled instead of returning an error or throwing an exception), and then any further trailing byte should be treated isolated as an error: The number of returned U+FFFD replacements would then be the same when you scan the input forward or backward without **ever** reading more than 4 bytes in all directions (this is a problem when the parseing will reach an end of buffer where you'll block on performing I/O to read the previous or next block, and managing a cache of multiple blocks (more than 2) is a problem with this unexpected change that will create new performance problems and add new memory constraints (in adition to new possible attacks if that parser needs to keep multiple buffers in memorty instead of treating them individually with a single overhead buffer, and throwing away the individual buffers on the fly as soon as they are indivisually fully parsed). 2017-05-18 1:41 GMT+02:00 Asmus Freytag via Unicode <unicode@unicode.org>: > On 5/17/2017 2:31 PM, Richard Wordingham via Unicode wrote: > > There's some sort of rule that proposals should be made seven days in > advance of the meeting. I can't find it now, so I'm not sure whether > the actual rule was followed, let alone what authority it has. > > Ideally, proposals that update algorithms or properties of some > significance should be required to be reviewed in more than one pass. The > procedures of the UTC are a bit weak in that respect, at least compared to > other standards organizations. The PRI process addresses that issue to some > extent. > > A./ >