08-Feb-2014 09:45, Jonathan M Davis пишет:
On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
Actually, thinking this through some more, if we can replace invalid Unicode
with 0xFFFD, and have all algorithms work with that and consider it valid
Unicode (rather than getting weird bugs due to invalid Unicode), then if
decode returned that on error rather than throwing, we wouldn't actually need
to check the return value. It wouldn't matter that the Unicode was invalid.
So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who
_did_ care could call isValidUnicode to validate the Unicode first, and those
who didn't wouldn't need to worry about UTFException being thrown, because
everything would still work even if the string was invalid Unicode.

Hm.. yes. I gotta read the whole thread next time :)


So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by
proposing that we return that rather than throwing, then I rescind my
assessment that throwing was the best way to go and have to agree that
returning 0xFFFD would be better. I was responding under the assumption that
you had to check for 0xFFFD and respond to it order to avoid having your code
be buggy, in which case throwing would be far better. But if 0xFFFD is
considered valid Unicode,

It is.

then returning that would be a fantastic solution.
And if that's the case, we only need two functions, not three:

1. decode, which returns 0xFFFD on decode failure

2. isValidUnicode, which returns whether the string is valid


Yay.

And I actually really like the idea that we could just operate on invalid
Unicode as valid Unicode this way, making it so that most code doesn't need to
care, and code that _does_ need to care, can validate the strings first. Right
now, pretty much all string code needs to care in order to avoid processing
invalid Unicode, which is much messier.

Horray! The goodness is that for example I can run regex on partially broken text and have some sane results out of it.

- Jonathan M Davis



--
Dmitry Olshansky

Reply via email to