No, you cannot rely on pmap to do that.
pmap is lazy in the sequence it produces, so it tries not to work
farther ahead of the consumer of its "output sequence" than the amount
of parallelism it uses, which is the number of available processors
plus 2.
Suppose you have 4 available processors, so pmap tries to keep at most
6 parallel invocations of your map function running. Suppose also for
example's sake that the function you are using with pmap on each
element is sequential, so it uses at most one processor at a time.
Imagine that the numbers below are the times in seconds required to
calculate your function on each element of the input sequence to pmap:
100 1 1 1 1 (15 more elements requiring 1 second each) 100 1 1 1 1 (15
more elements requiring 1 second each)
When some code tries to retrieve the first element of the lazy seq
that is the output of pmap, 6 threads will start calculating. The
threads for calculating the function on the 2nd through 6th elements
will finish soon (if scheduling is fair), but no threads will be
invoked to calculate the function on the 7th element yet, because pmap
is trying not to work too far ahead. When the function is done
calculating the first element, and some consumer tries to access the
2nd element of the output sequence, then a thread will be started
working on the 7th input element, and so on.
So if the consumer of the output sequence is trying to retrieve
elements as quickly as possible and doing some negligible amount of
processing on each one, here will be the rough pattern of CPU busy time:
(1) 1.5 seconds of 4 processors working for 1 second each on the first
6 elements.
(2) about 99 seconds more time finishing the function on the first
element, but only 1 processor busy at a time, the other 3 idle
(3) about 14/4 seconds of all 4 processors busy working on the next 14
elements, each taking about 1 sec of CPU time.
Then this pattern repeats when the next "heavy element" requiring 100
seconds to calculate the function is reached.
Note: Sometimes working at odds with pmap's "Don't work too far ahead"
approach is if the input sequence to pmap is chunked. When a chunk is
reached, all elements in that chunk have threads start in parallel, so
the number of parallel threads can easily exceed (availableProcessors
+2) during those times.
Amit Rathore's medusa is intended to let you keep processors busy, but
with a different method of invoking it than pmap.
Andy
On Jan 23, 2011, at 5:56 PM, Michael Gardner wrote:
Suppose I have a sequence of tasks I'd like to parallelize using
pmap. The amount of CPU time these tasks require varies greatly; in
particular, many of them will require virtually no work. Can I rely
on pmap to divide the work efficiently even if there is some pattern
to the distribution of easy tasks (e.g. clumped at the beginning, or
every nth)? Assume it's not possible to identify the easy tasks
beforehand.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient
with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en