Re: Enhanced Primitive Support

Mark Engelberg Mon, 21 Jun 2010 21:53:06 -0700

The new uber-loop is fantastic.

So I guess the main point still to be finalized is whether the default
arithmetic ops will auto-promote or error when long addition
overflows.


Playing around with the latest equals branch:

user=> (def n 9223372036854775810)
#'user/n

user=> (* (/ n 3) 3)
9223372036854775810N

user=> (* (/ n 2) 2)
java.lang.ArithmeticException: integer overflow

user=> (def x (/ n 4))
#'user/x

user=> (+ x x x x)
9223372036854775810N

user=> (+ (+ x x) (+ x x))
java.lang.ArithmeticException: integer overflow

user=> (range (- n 2) n)
(9223372036854775808N 9223372036854775809N)

user=> (range (- n 3) n)
java.lang.ArithmeticException: integer overflow


I understand exactly why some of these work and some of these don't.
My main point here is to illustrate that without the full numeric
tower supported by the default ops, there can certainly be some
surprises.  There is a "pathway" with the standard ops from longs to
rational numbers to bigints, but you can't cross directly from longs
to bigints.  Similarly, you can roundtrip from bigints to rationals
and back, but can't roundtrip from bigints to longs and back.  So the
results of computations depends on the path you follow.  Maybe we can
live with these idiosyncrasies for the speed benefits, but this is
worth being aware of.

The range example is something that is easily enough fixed.  Probably
if this branch becomes the standard, range should be modified so that
if the *upper bound* is a bigint, inc' is used rather than inc to
generate the range.  But I think this illustrates the kinds of issues
we're headed towards, regardless of which default is chosen -- people
who write libraries will be choosing between overflow and auto-boxing
primitives, and it might not always be clear from documentation what
the consequences are.  In the current implementation of range, it
works perfectly fine with longs, and it works perfectly fine with
bigints, but it breaks when your lower and upper bounds cross the
boundary.  This is exactly the kind of thing that might not be thought
of when making test cases, so errors like this could lurk for quite a
while without being spotted.

The people on the side of overflow-error-as-default feel that these
sorts of runtime errors are no more problematic than the many other
sorts of runtime errors that can result in Clojure, such as an
out-of-bounds exception when accessing a vector.  But I see these
types of errors as very different.  An out-of-bounds exception is easy
enough to prevent -- there is a simple test you can include in your
code to make sure your index is in bounds before you access your
vector.  But I think it's much harder to determine in advance whether
a sequence of computations will "cross the long boundary" for all the
possible inputs.

This is probably the main reason I continue to advocate for
auto-promoting ops as the default.  Error-upon-overflow adds an
element of run-time risk, and requires careful thought and additional
testing to achieve the same level of reliability.  I *want*
error-upon-overflow operations to be slightly harder to use so that
library writers will use them judiciously and consciously, and be very
aware of the extra effort they need to go to test their functions for
all numbers, clearly documenting any restrictions on the kinds of
numbers that are permitted.

Like I said before, Clojure's built-in range can easily be adjusted to
work well for speed *and* handle both longs and bigints gracefully.
But it serves as a good example of how the two defaults will affect
library writers.  If auto-promotion is the default, most library
writers will just use the +,*,-,inc,dec operators and it would work
for all numbers right out of the box.  A library writer who wants to
optimize for speed would have to go to a bit of extra effort to add
the apostrophes, and would hopefully at that point give some careful
thought as to what the consequences will be, catching the fact that
this will break ranges that span from longs to bigints, and adjusting
the code accordingly.  On the other hand, if overflow-on-error is the
default, this is what most people will use, and we'll end up with a
lot of code that breaks when crossing the long boundary.

I don't use any bigints, or anything even close to overflowing a long,
in the kind of code that I write for work.  If error-upon-overflow
wins as the default, I'll gain performance benefits with no immediate
downside.  But ultimately, I feel that anything that helps me reason
about and trust my code, and helps me trust the robustness of code
written by others that I rely upon, is a principle worth fighting for.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced Primitive Support

Reply via email to