On Jun 17, 2010, at 9:55 PM, Mark Engelberg wrote:
It's great to be discussing the possibility of enhanced primitive
support.
I like most of the proposals. But this part troubles me:
# For bigints, specify bigints
* new literal bigint format 42N
# bigints contagious, no more auto-reductions
* (\+ 0N 21 21) => 42N
One of the big reasons I prefer Clojure over languages like Scala/F#
is that in Clojure, numbers "just work". I don't have to worry about
overflow errors, and the use of numbers in sets/hash tables is
generally pretty intuitive (it gets a little wonky if you're using
numbers produced by Java methods, but at least you're good if you stay
within Clojure).
As a case in point, consider writing a naive recursive factorial
function.
(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))
In Clojure, (fact 3) produces the Integer 6. (fact 40) produces the
BigInteger 815915283247897734345611269596115894272000000000. It just
works. The answers produced by fact can be used in collections,
compared for equality with other numbers.
In F#, Scala, if you want factorial to not break on something like an
input of 40, you have to essentially use the bigint literal syntax
like what you're proposing:
(defn fact [n]
(if (zero? n) 1N (* n (fact (dec n)))))
Now, the results of this function are "contaminating bigints" which
can't be used as effectively with other Clojure data structures. As
you pointed out, it causes problems when the same number can have two
different type representations depending on the sequence of
computations used to get to that number. This will always be somewhat
of a problem due to Java interop, but it's important to me that all
numbers arrived at through Clojure computations just work with respect
to equality and use in sets/hash tables. The lack of auto-reduction,
in conjunction with the proposal to make equality take numeric type
into account, will really cripple the utility of bigints.
Even more frustrating, in my experience with languages like F# and
Scala, functions written like this are way, way slower. The reason is
that Java's bigint arithmetic is sloooooow. To protect yourself from
overflow, you've forced the function to use bigint arithmetic on
1N*2N*3N*..., well before you even reach a bigint. For this
particular function, it reaches bigint status quite quickly, so the
performance degradation is fairly minimal. But I have found that in
general, requiring programmers to explicitly use bigint arithmetic for
small numbers in order to protect from overflow is an absolute
performance killer. Clojure's technique of auto-promotion to bigint
upon overflow, and then auto-reduction when appropriate, is way, way
faster than using bigints all the time.
In summary:
No auto-reduction would make bigints second-class citizens and make
them difficult to use in other functions and collections.
No auto-promotion to bigints means that programmers who want to
protect from overflow have to use bigints everywhere, which destroys
performance.
So what would be the downside of implementing most of these proposals
(e.g., 1 is a literal for long 1), but retain auto-promotion and
auto-reduction rather than throwing an error for primitive long math?
Seems to me like this would be the best of both worlds. What am I
missing?
You raise several points, I'll take them in turn. First, things are
not as dire as you state. Certainly one could write a version of fact
that hardwired bigints as you showed. But you need not - here's the
original naive definition:
(defn fact [n]
(if (zero? n) 1 (* n (fact (dec n)))))
Such logic that doesn't specify one of the components (in this case,
n) is still polymorphic, as are operations with mixed components. So
the power is in the hands of the supplier of n:
user=> (fact 10)
3628800
user=> (fact 40)
java.lang.ArithmeticException: integer overflow
user=> (fact 40N)
815915283247897734345611269596115894272000000000N
I think you can always write code in this way, such that it will work
with any of the numeric types, without dictating.
----
It is true that Java's BigIntegers are slow, especially for numbers
small enough to fit in Long or Integer, so using them for smaller
numbers has overhead. But there are other options, not yet pursued
here. For instance, Kawa has a higher performing biginteger that uses
an internal int component until it overflows. I'm sure no one has a
particular attachment to BigInteger should we find a better
alternative. It would be easy to create a big integer class based on
the Kawa concept, perhaps with a long or int basis overflowing to
BigInteger itself.
http://groups.google.com/group/jvm-languages/msg/0413ed05d1371907
----------
As far as performance killer, that depends on your perspective. It is
my belief that overwhelming majority of code *never* uses numbers that
exceed long, And those people are suffering a 'performance killer' on
the order of 10-20x by having numbers be boxed all the time.
------
The interaction between equality and the BigInteger schism is
something I am still thinking about. That is why the branches are
staged. Only the 'equal' branch has the type-strict =
--------
The thing that you are missing is that there is no single
representation on the JVM* that can hold either a primitive or a
reference to an Object like a BigInteger. That is where the current
semantic difference creeps in - a primitive op sourced by a primitive
must have as its target a primitive - should it produce an object
there would be nowhere to put it. And any interface points (like the
math operations themselves) cannot dynamically take either a primitive
or an object. So the auto-promotion has an unavoidable cost of the use
of objects to hold all values, and the allocation thereof. Opting out
of the objects means a change in semantics. Auto-promotion cannot be
made a uniform semantic for both primitives and objects.
---------
Another alternative is to provide a second set of operations with auto-
promoting, object based semantics. These would never operate on
primitives. Since a key argument for auto-promotion is the poor
behavior of small bigints, making a better bigint seems preferable.
--------
In the end, this does put the interests of two sides of the community
at odds. Only one set of interests can be the default. Part of the
point of this discussion is to measure the sides (so speak up people!).
I think the current default doesn't serve the majority well, and has a
semantic rift. Making ordinary programs that use longs and doubles run
fast requires extensive and tedious annotation that is easy to get
wrong. This work proves it could be completely otherwise.
Rich
* There are proposals:
http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
Note that even on mature platforms like CL with very good tagging and
fixnum systems, actually getting at full-width 32- and 64-bit hardware
operations requires extensive, tedious and error-prone annotation.
I.e. a tagged number is not a hardware number.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en