Re: pmap uses more parallelism than intended due to use of "eager" map?

Andy Fingerhut Tue, 04 Aug 2009 15:38:59 -0700

Ugh.  A couple minutes of more careful searching and I had the
answer.  It isn't because the map used by pmap is eager, it is because
it optimizes for input collections that are chunked.


It seems worth considering modifying pmap to either:

(1) use a lazy version of map, with no optimization for chunked
collections.  That's pretty quick and easy.

or

(2) modify the pmap implementation so that even if map optimized for
chunked collections, pmap doesn't use more parallelism than intended.
That would require a more significant code change in pmap.

Thanks,
Andy


On Aug 4, 1:31 pm, Andy Fingerhut <andy_finger...@alum.wustl.edu>
wrote:
> I was looking into the question raised in the "Question about pmap"
> thread, and noticed that on my Mac and on a Linux virtual machine, a
> recent git version of clojure (about 1 week old) seems to use more
> parallelism in 'pmap' than its source code in core.clj would imply is
> the intent.  The code there seems to imply that the desired number of
> threads to run at once is 2 more than the number of available
> processors.  However, when I run it, it always fires off one thread
> for every element of the collection, right at the beginning,
> regardless of the number of available processors.
>
> I created my own slightly modified version of pmap, where the only
> difference is that the version of 'map' used has been changed to 'my-
> lazy-map', which is explicitly lazy.  See this link for the source
> code of the test I was using.  Ignore modified-pmap1.  Look at
> modified-pmap2.
>
> http://github.com/jafingerhut/clojure-benchmarks/blob/0227501c6c53736...
>
> You can try this out on your system with a command line like this:
>
> clj pmap-test.clj 10 100000000
>
> You might need to adjust the second number to be more or less,
> depending upon the speed of your machine.  It would be good to make it
> take at least a couple of seconds.  The first number is the length of
> the collection over which map/pmap is run, and for most obvious
> results should be about 3 to 4 times the number of processors on your
> machine.
>
> I suspect that the issue is that the version of map used by pmap in
> core.clj is eager, i.e. not lazy, and thus all of the called to future
> are performed as soon as the last of these lines in the pmap function
> is executed:
>
> (defn pmap
>   ;; doc string deleted for brevity
>   ([f coll]
>    (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
>          rets (map #(future (f %)) coll)
>
> but I don't know yet where the implementation of map that is "active"
> at that point in the source code is.  Perhaps it is implemented in
> Java?
>
> Andy

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: pmap uses more parallelism than intended due to use of "eager" map?

Reply via email to