Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode Thu, 18 May 2017 02:04:27 -0700

On 18 May 2017, at 07:18, Henri Sivonen via Unicode <unicode@unicode.org> wrote:
> 
> the decision complicates U+FFFD generation when validating UTF-8 by state 
> machine.


It *really* doesn’t.  Even if you’re hell bent on using a pure state machine 
approach, you need to add maybe two additional error states 
(two-trailing-bytes-to-eat-then-fffd and one-trailing-byte-to-eat-then-fffd) on 
top of the states you already have.  The implementation complexity argument is 
a *total* red herring.

> 2) Procedural: To be considered in the future, proposals to change
> what the standard suggests or requires implementations to do should
> consider different implementation strategies and discuss the impact of
> the change in the light of the different implementation strategies (in
> the matter at hand, I think the proposal should have included a
> discussion of the impact on UTF-8 validation state machines)

Well, let’s discuss that here and now (see above).  Do you, for some reason, 
think that it’s more complicated than I suggest?

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to