Re: Enhanced Primitive Support

Rich Hickey Thu, 17 Jun 2010 20:01:15 -0700


On Jun 17, 2010, at 9:55 PM, Mark Engelberg wrote:

It's great to be discussing the possibility of enhanced primitivesupport.


I like most of the proposals.  But this part troubles me:

# For bigints, specify bigints
   * new literal bigint format 42N
# bigints contagious, no more auto-reductions
   * (\+ 0N 21 21) => 42N

One of the big reasons I prefer Clojure over languages like Scala/F#
is that in Clojure, numbers "just work".  I don't have to worry about
overflow errors, and the use of numbers in sets/hash tables is
generally pretty intuitive (it gets a little wonky if you're using
numbers produced by Java methods, but at least you're good if you stay
within Clojure).

As a case in point, consider writing a naive recursive factorialfunction.


(defn fact [n]
 (if (zero? n) 1 (* n (fact (dec n)))))

In Clojure, (fact 3) produces the Integer 6.  (fact 40) produces the
BigInteger 815915283247897734345611269596115894272000000000.  It just
works.  The answers produced by fact can be used in collections,
compared for equality with other numbers.

In F#, Scala, if you want factorial to not break on something like an
input of 40, you have to essentially use the bigint literal syntax
like what you're proposing:
(defn fact [n]
 (if (zero? n) 1N (* n (fact (dec n)))))

Now, the results of this function are "contaminating bigints" which
can't be used as effectively with other Clojure data structures.  As
you pointed out, it causes problems when the same number can have two
different type representations depending on the sequence of
computations used to get to that number.  This will always be somewhat
of a problem due to Java interop, but it's important to me that all
numbers arrived at through Clojure computations just work with respect
to equality and use in sets/hash tables.  The lack of auto-reduction,
in conjunction with the proposal to make equality take numeric type
into account, will really cripple the utility of bigints.

Even more frustrating, in my experience with languages like F# and
Scala, functions written like this are way, way slower.  The reason is
that Java's bigint arithmetic is sloooooow.  To protect yourself from
overflow, you've forced the function to use bigint arithmetic on
1N*2N*3N*..., well before you even reach a bigint.  For this
particular function, it reaches bigint status quite quickly, so the
performance degradation is fairly minimal.  But I have found that in
general, requiring programmers to explicitly use bigint arithmetic for
small numbers in order to protect from overflow is an absolute
performance killer.  Clojure's technique of auto-promotion to bigint
upon overflow, and then auto-reduction when appropriate, is way, way
faster than using bigints all the time.

In summary:
No auto-reduction would make bigints second-class citizens and make
them difficult to use in other functions and collections.
No auto-promotion to bigints means that programmers who want to
protect from overflow have to use bigints everywhere, which destroys
performance.

So what would be the downside of implementing most of these proposals
(e.g., 1 is a literal for long 1), but retain auto-promotion and
auto-reduction rather than throwing an error for primitive long math?
Seems to me like this would be the best of both worlds.  What am I
missing?

You raise several points, I'll take them in turn. First, things arenot as dire as you state. Certainly one could write a version of factthat hardwired bigints as you showed. But you need not - here's theoriginal naive definition:


(defn fact [n]
  (if (zero? n) 1 (* n (fact (dec n)))))

Such logic that doesn't specify one of the components (in this case,n) is still polymorphic, as are operations with mixed components. Sothe power is in the hands of the supplier of n:


user=> (fact 10)
3628800

user=> (fact 40)
java.lang.ArithmeticException: integer overflow

user=> (fact 40N)
815915283247897734345611269596115894272000000000N

I think you can always write code in this way, such that it will workwith any of the numeric types, without dictating.


----

It is true that Java's BigIntegers are slow, especially for numberssmall enough to fit in Long or Integer, so using them for smallernumbers has overhead. But there are other options, not yet pursuedhere. For instance, Kawa has a higher performing biginteger that usesan internal int component until it overflows. I'm sure no one has aparticular attachment to BigInteger should we find a betteralternative. It would be easy to create a big integer class based onthe Kawa concept, perhaps with a long or int basis overflowing toBigInteger itself.


http://groups.google.com/group/jvm-languages/msg/0413ed05d1371907

----------

As far as performance killer, that depends on your perspective. It ismy belief that overwhelming majority of code *never* uses numbers thatexceed long, And those people are suffering a 'performance killer' onthe order of 10-20x by having numbers be boxed all the time.


------

The interaction between equality and the BigInteger schism issomething I am still thinking about. That is why the branches arestaged. Only the 'equal' branch has the type-strict =


--------

The thing that you are missing is that there is no singlerepresentation on the JVM* that can hold either a primitive or areference to an Object like a BigInteger. That is where the currentsemantic difference creeps in - a primitive op sourced by a primitivemust have as its target a primitive - should it produce an objectthere would be nowhere to put it. And any interface points (like themath operations themselves) cannot dynamically take either a primitiveor an object. So the auto-promotion has an unavoidable cost of the useof objects to hold all values, and the allocation thereof. Opting outof the objects means a change in semantics. Auto-promotion cannot bemade a uniform semantic for both primitives and objects.


---------

Another alternative is to provide a second set of operations with auto-promoting, object based semantics. These would never operate onprimitives. Since a key argument for auto-promotion is the poorbehavior of small bigints, making a better bigint seems preferable.


--------

In the end, this does put the interests of two sides of the communityat odds. Only one set of interests can be the default. Part of thepoint of this discussion is to measure the sides (so speak up people!).

I think the current default doesn't serve the majority well, and has asemantic rift. Making ordinary programs that use longs and doubles runfast requires extensive and tedious annotation that is easy to getwrong. This work proves it could be completely otherwise.


Rich

* There are proposals:

http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

Note that even on mature platforms like CL with very good tagging andfixnum systems, actually getting at full-width 32- and 64-bit hardwareoperations requires extensive, tedious and error-prone annotation.I.e. a tagged number is not a hardware number.


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced Primitive Support

Reply via email to