On Jun 17, 2010, at 9:55 PM, Mark Engelberg wrote:

It's great to be discussing the possibility of enhanced primitive support.

I like most of the proposals.  But this part troubles me:

# For bigints, specify bigints
   * new literal bigint format 42N
# bigints contagious, no more auto-reductions
   * (\+ 0N 21 21) => 42N

One of the big reasons I prefer Clojure over languages like Scala/F#
is that in Clojure, numbers "just work".  I don't have to worry about
overflow errors, and the use of numbers in sets/hash tables is
generally pretty intuitive (it gets a little wonky if you're using
numbers produced by Java methods, but at least you're good if you stay
within Clojure).

As a case in point, consider writing a naive recursive factorial function.

(defn fact [n]
 (if (zero? n) 1 (* n (fact (dec n)))))

In Clojure, (fact 3) produces the Integer 6.  (fact 40) produces the
BigInteger 815915283247897734345611269596115894272000000000.  It just
works.  The answers produced by fact can be used in collections,
compared for equality with other numbers.

In F#, Scala, if you want factorial to not break on something like an
input of 40, you have to essentially use the bigint literal syntax
like what you're proposing:
(defn fact [n]
 (if (zero? n) 1N (* n (fact (dec n)))))

Now, the results of this function are "contaminating bigints" which
can't be used as effectively with other Clojure data structures.  As
you pointed out, it causes problems when the same number can have two
different type representations depending on the sequence of
computations used to get to that number.  This will always be somewhat
of a problem due to Java interop, but it's important to me that all
numbers arrived at through Clojure computations just work with respect
to equality and use in sets/hash tables.  The lack of auto-reduction,
in conjunction with the proposal to make equality take numeric type
into account, will really cripple the utility of bigints.

Even more frustrating, in my experience with languages like F# and
Scala, functions written like this are way, way slower.  The reason is
that Java's bigint arithmetic is sloooooow.  To protect yourself from
overflow, you've forced the function to use bigint arithmetic on
1N*2N*3N*..., well before you even reach a bigint.  For this
particular function, it reaches bigint status quite quickly, so the
performance degradation is fairly minimal.  But I have found that in
general, requiring programmers to explicitly use bigint arithmetic for
small numbers in order to protect from overflow is an absolute
performance killer.  Clojure's technique of auto-promotion to bigint
upon overflow, and then auto-reduction when appropriate, is way, way
faster than using bigints all the time.

In summary:
No auto-reduction would make bigints second-class citizens and make
them difficult to use in other functions and collections.
No auto-promotion to bigints means that programmers who want to
protect from overflow have to use bigints everywhere, which destroys
performance.

So what would be the downside of implementing most of these proposals
(e.g., 1 is a literal for long 1), but retain auto-promotion and
auto-reduction rather than throwing an error for primitive long math?
Seems to me like this would be the best of both worlds.  What am I
missing?


You raise several points, I'll take them in turn. First, things are not as dire as you state. Certainly one could write a version of fact that hardwired bigints as you showed. But you need not - here's the original naive definition:

(defn fact [n]
  (if (zero? n) 1 (* n (fact (dec n)))))

Such logic that doesn't specify one of the components (in this case, n) is still polymorphic, as are operations with mixed components. So the power is in the hands of the supplier of n:

user=> (fact 10)
3628800

user=> (fact 40)
java.lang.ArithmeticException: integer overflow

user=> (fact 40N)
815915283247897734345611269596115894272000000000N

I think you can always write code in this way, such that it will work with any of the numeric types, without dictating.

----
It is true that Java's BigIntegers are slow, especially for numbers small enough to fit in Long or Integer, so using them for smaller numbers has overhead. But there are other options, not yet pursued here. For instance, Kawa has a higher performing biginteger that uses an internal int component until it overflows. I'm sure no one has a particular attachment to BigInteger should we find a better alternative. It would be easy to create a big integer class based on the Kawa concept, perhaps with a long or int basis overflowing to BigInteger itself.

http://groups.google.com/group/jvm-languages/msg/0413ed05d1371907

----------
As far as performance killer, that depends on your perspective. It is my belief that overwhelming majority of code *never* uses numbers that exceed long, And those people are suffering a 'performance killer' on the order of 10-20x by having numbers be boxed all the time.

------
The interaction between equality and the BigInteger schism is something I am still thinking about. That is why the branches are staged. Only the 'equal' branch has the type-strict =

--------
The thing that you are missing is that there is no single representation on the JVM* that can hold either a primitive or a reference to an Object like a BigInteger. That is where the current semantic difference creeps in - a primitive op sourced by a primitive must have as its target a primitive - should it produce an object there would be nowhere to put it. And any interface points (like the math operations themselves) cannot dynamically take either a primitive or an object. So the auto-promotion has an unavoidable cost of the use of objects to hold all values, and the allocation thereof. Opting out of the objects means a change in semantics. Auto-promotion cannot be made a uniform semantic for both primitives and objects.

---------
Another alternative is to provide a second set of operations with auto- promoting, object based semantics. These would never operate on primitives. Since a key argument for auto-promotion is the poor behavior of small bigints, making a better bigint seems preferable.

--------
In the end, this does put the interests of two sides of the community at odds. Only one set of interests can be the default. Part of the point of this discussion is to measure the sides (so speak up people!).

I think the current default doesn't serve the majority well, and has a semantic rift. Making ordinary programs that use longs and doubles run fast requires extensive and tedious annotation that is easy to get wrong. This work proves it could be completely otherwise.

Rich

* There are proposals:

http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

Note that even on mature platforms like CL with very good tagging and fixnum systems, actually getting at full-width 32- and 64-bit hardware operations requires extensive, tedious and error-prone annotation. I.e. a tagged number is not a hardware number.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to