Andy Fingerhut <andy.finger...@gmail.com> writes: > I'm not practiced in recognizing megamorphic call sites, so I could be > missing some in the example code below, modified from Lee's original > code. It doesn't use reverse or conj, and as far as I can tell > doesn't use PersistentList, either, only Cons.
... > Can you try to reproduce to see if you get similar results? If so, do > you know why we get bad parallelism in a single JVM for this code? If > there are no megamorphic call sites, then it is examples like this > that lead me to wonder about locking in memory allocation and/or GC. I think your benchmark is a bit different from Lee’s original. The `reverse`-based versions perform heavily allocation as they repeatedly reverse a sequence, but each thread will hold a sequence of length at most 10,000 at any given time. In your benchmark, each thread holds a sequence of at most 2,000,000 elements, for a naive 200x increase in memory pressure and a potential increase in the number of objects being promoted out of the young generation. I ran your run benchmark under a version of Cameron’s criterium-based speed-up measurement wrapper I’ve modified to pass in the `pmap` function to use. I reduced the number of iterations in your algorithm by a factor of 5 to get it to run in a reasonable amount of time. And I ran it using default JVM GC settings, on a 32-way AMD system. I get the following numbers for 1-32 way parallelism with a 500MB heap: andy 1 : smap-ms 7.5, pmap-ms 7.7, speedup 0.97 andy 2 : smap-ms 7.8, pmap-ms 9.8, speedup 0.80 andy 4 : smap-ms 8.5, pmap-ms 10.6, speedup 0.80 andy 8 : smap-ms 8.6, pmap-ms 11.5, speedup 0.75 andy 16 : smap-ms 8.1, pmap-ms 12.5, speedup 0.65 andy 32 : [java.lang.OutOfMemoryError: Java heap space] And these numbers with a 4GB heap: andy 1 : smap-ms 3.8, pmap-ms 4.0, speedup 0.95 andy 2 : smap-ms 4.2, pmap-ms 2.1, speedup 2.02 andy 4 : smap-ms 4.2, pmap-ms 1.7, speedup 2.48 andy 8 : smap-ms 4.2, pmap-ms 1.2, speedup 3.44 andy 16 : smap-ms 4.4, pmap-ms 1.0, speedup 4.52 andy 32 : smap-ms 4.0, pmap-ms 1.6, speedup 2.55 I’m running out of time for breakfast experiments, but it seems relatively likely to me that the increased at-once sequence size in your benchmark is increasing the number of objects making it out of the young generation. This in turn is increasing the number of pause-the-world GCs, which increase even further in frequency at lower heap sizes. I’ll run these again later with GC logging and report if the results are unexpected. -Marshall -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en