Hi Jason, Have you guys taken a look at core.matrix for any of this stuff? We're also shooting for near-Java-parity for all of the core operations on large double arrays.
(use 'clojure.core.matrix) (require '[criterium.core :as c]) (let [a (double-array (range 10000))] (c/quick-bench (esum a))) WARNING: Final GC required 69.30384798936066 % of runtime Evaluation count : 45924 in 6 samples of 7654 calls. Execution time mean : 12.967112 µs Execution time std-deviation : 326.480900 ns Execution time lower quantile : 12.629252 µs ( 2.5%) Execution time upper quantile : 13.348527 µs (97.5%) Overhead used : 3.622005 ns All the core.matrix functions get dispatched via protocols, so they work on any kind of multi-dimensional matrix (not just Java arrays). This adds a tiny amount of overhead (about 10-15ns), but it is negligible when dealing with medium-to-large vectors/matrices/arrays. I'm interested in feedback and hopefully we can collaborate: I'm keen to get the best optimised numerical functions we can in Clojure. Also, I think you may find the core.matrix facilities very helpful when moving to higher level abstractions (i.e. 2D matrices and higher order multi-dimensional arrays) On Thursday, 13 June 2013 21:50:48 UTC+1, Jason Wolfe wrote: > > Taking a step back, the core problem we're trying to solve is just to sum > an array's values as quickly as in Java. (We really want to write a fancier > macro that allows arbitrary computations beyond summing that can't be > achieved by just calling into Java, but this simpler task gets at the crux > of our performance issues). > > This Java code: > > public static double asum_noop_indexed(double[] arr) { > double s = 0; > for (int i = 0; i < arr.length; i++) { > s += arr[i]; > } > return s; > } > > can run on an array with 10k elements in about 8 microseconds. In > contrast, this Clojure code (which I believe used to be as fast as the Java > in a previous Clojure version): > > (defn asum-identity [^doubles a] > (let [len (long (alength a))] > (loop [sum 0.0 > idx 0] > (if (< idx len) > (let [ai (aget a idx)] > (recur (+ sum ai) (unchecked-inc idx))) > sum)))) > > executes on the same array in about 40 microseconds normally, or 14 > microseconds with *unchecked-math* set to true. (We weren't using > unchecked-math properly until today, which is why we were doing the hacky > interface stuff above, please disregard that -- but I think the core point > about an extra cast is still correct). > > For reference, (areduce a1 i r 0.0 (+ (aget a1 i) r)) takes about 23 ms to > do the same computation (with unchecked-math true). > > Does anyone have ideas for how to achieve parity with Java on this task? > They'd be much appreciated! > > Thanks, Jason > > On Thursday, June 13, 2013 12:02:56 PM UTC-7, Leon Barrett wrote: >> >> Hi. I've been working with people at Prismatic to optimize some simple >> math code in Clojure. However, it seems that Clojure generates an >> unnecessary type check that slows our (otherwise-optimized) code by 50%. Is >> there a good way to avoid this, is it a bug in Clojure 1.5.1, or something >> else? What should I do to work around this? >> >> Here's my example. The aget seems to generate an unnecessary >> checkcastbytecode. I used Jasper and Jasmin to decompile and recompile >> Bar.class >> into Bar_EDITED.class, without that bytecode. The edited version takes >> about 2/3 the time. >> >> (ns demo >> (:import demo.Bar_EDITED)) >> >> (definterface Foo >> (arraysum ^double [^doubles a ^int i ^int asize ^double sum])) >> >> (deftype Bar [] >> Foo >> (arraysum ^double [this ^doubles a ^int i ^int asize ^double sum] >> (if (< i asize) >> (recur a (unchecked-inc-int i) asize (+ sum (aget a i))) >> sum))) >> >> (defn -main [& args] >> (let [bar (Bar.) >> bar-edited (Bar_EDITED.) >> asize 10000 >> a (double-array asize) >> i 0 >> ntimes 10000] >> (time >> >> (dotimes [iter ntimes] >> (.arraysum bar a i asize 0))) >> (time >> (dotimes [iter ntimes] >> (.arraysum bar-edited a i asize 0))))) >> >> >> ;; $ lein2 run -m demo >> ;; Compiling demo >> ;; "Elapsed time: 191.015885 msecs" >> ;; "Elapsed time: 129.332 msecs" >> >> >> Here's the bytecode for Bar.arraysum: >> >> public java.lang.Object arraysum(double[], int, int, double); >> Code: >> 0: iload_2 >> 1: i2l >> 2: iload_3 >> 3: i2l >> 4: lcmp >> 5: ifge 39 >> 8: aload_1 >> 9: iload_2 >> 10: iconst_1 >> 11: iadd >> 12: iload_3 >> 13: dload 4 >> 15: aload_1 >> 16: aconst_null >> 17: astore_1 >> 18: checkcast #60 // class "[D" >> 21: iload_2 >> 22: invokestatic #64 // Method >> clojure/lang/RT.intCast:(I)I >> 25: daload >> 26: dadd >> 27: dstore 4 >> 29: istore_3 >> 30: istore_2 >> 31: astore_1 >> 32: goto 0 >> 35: goto 44 >> 38: pop >> 39: dload 4 >> 41: invokestatic #70 // Method >> java/lang/Double.valueOf:(D)Ljava/lang/Double; >> 44: areturn >> >> >> As far as I can tell, Clojure generated a checkcast opcode that tests on >> every loop to make sure the double array is really a double array. When I >> remove that checkcast, I get a 1/3 speedup (meaning it's a 50% overhead). >> >> Can someone help me figure out how to avoid this overhead? >> >> Thanks. >> >> - Leon Barrett >> > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.