On Friday, 7 February 2014 at 23:45:06 UTC, Jonathan M Davis wrote:
On Friday, February 07, 2014 23:01:46 Meta wrote:
On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis

wrote:
> On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
>> 07-Feb-2014 20:29, Andrej Mitrovic пишет:
>> > On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
>> > >> > Alexandrescu wrote: >> >> Add a bugzilla and let's define isValid that returns >> >> bool! >> > >> > Add std.utf.decode() to that as well. IOW, it should have >> > an
>> > overload
>> > which returns a status code
>> >> Much simpler - it returns a special dchar to designate bad
>> encoding. And
>> there is one defined by Unicode spec.
> > Isn't that actually worse? Unless you're suggesting that we
> stop throwing on
> decode errors, then functions like std.array.front will have > to
> check the
> result on every call to see whether it was valid or not and
> thus whether they
> should throw, which would mean extra overhead over simply
> having decode throw
> on decode errors. validate has no business throwing, and we
> definitely should
> add isValidUnicode (or isValid or whatever you want to call > it)
> for validation
> purposes. Code can then call that to validate that a string > is
> valid and not
> worry about any UTFExceptions being thrown as long as it
> doesn't manipulate
> the string in a way that could result in its Unicode becoming
> invalid.
> However, I would argue that assuming that everyone is going > to
> validate their
> strings and that pretty much all string-related functions
> shouldn't ever have
> to worry about invalid Unicode is just begging for subtle > bugs
> all over the
> place IMHO. You're essentially dealing with error codes at > that
> point, and I
> think that experience has shown quite clearly that error > codes
> are generally a
> bad way to go. Almost no one checks them unless they have > to. I
> think that
> having decode throw on invalid Unicode is exactly what it
> should be doing. The
> problem is that validate shouldn't.
> > - Jonathan M Davis

You could always return an Option!char. Nullable won't work
because it lets you access the naked underlying value.

How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has
to do the check).

We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.

Code that doesn't want to risk a UTFException being thrown can validate up front - and that validator function return bool and _not_ throw. But having decode not throw is going to be error-prone. It also doesn't help performance- wise, because it still has to do all of the same validity checks as it decodes. It's just that instead of throwing, it returns an error value. I really think that having decode throw on invalid Unicode is the right
decision, and I don't see what we gain by making it not throw.

- Jonathan M Davis

Reply via email to