> it’s more meaningful for whoever sees the output to see a single U+FFFD 
> representing 
> the illegally encoded NUL that it is to see two U+FFFDs, one for an invalid 
> lead byte and 
> then another for an “unexpected” trailing byte.

I disagree.  It may be more meaningful for some applications to have a single 
U+FFFD representing an illegally encoded 2-byte NULL than to have 2 U+FFFDs.  
Of course then you don't know if it was an illegally encoded 2-byte NULL or an 
illegally encoded 3-byte NULL or whatever, so some information that other 
applications may be interested in is lost.

Personally, I prefer the "emit a U+FFFD if the sequence is invalid, drop the 
byte, and try again" approach.  

-Shawn

Reply via email to