It isn't hard to write your own variation of pmap that does not do more
parallelism than you want, regardless of whether the input sequence is
chunked or not. I wrote one for a Clojure submission to the computer
language benchmarks game a year or so ago. Besides avoiding unwanted
parallelism for chunked sequences, it also has an option to specify the
desired maximum number of parallel threads:
(defn my-lazy-map [f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s)) (my-lazy-map f (rest s))))))
;; modified-pmap is like pmap from Clojure 1.1, but with only as much
;; parallelism as specified by the parameter num-threads. Uses
;; my-lazy-map instead of map from core.clj, since that version of map
;; can use unwanted additional parallelism for chunked collections,
;; like ranges.
(defn modified-pmap
([num-threads f coll]
(if (== num-threads 1)
(map f coll)
(let [n (if (>= num-threads 2) (dec num-threads) 1)
rets (my-lazy-map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets)))))
([num-threads f coll & colls]
(let [step (fn step [cs]
(lazy-seq
(let [ss (my-lazy-map seq cs)]
(when (every? identity ss)
(cons (my-lazy-map first ss)
(step (my-lazy-map rest ss)))))))]
(modified-pmap num-threads #(apply f %) (step (cons coll colls))))))
I'm not sure what you mean by "side effects are inconsistent"? If you mean
side effects in terms of mutating state, then I wouldn't recommend using
pmap with a function that has side effects, unless somehow you can guarantee
that the order that those functions are evaluated does not matter for the
final result.
If you mean "side effect" as in how many parallel threads can be created at
one time, then yes, there are differences in how pmap behaves depending upon
whether it is given a chunked sequence or not.
Andy
On Fri, Oct 21, 2011 at 7:37 PM, Marshall T. Vandegrift
<[email protected]>wrote:
> Stefan Kamphausen <[email protected]> writes:
>
> > Chunked seqs are supposed to realize more elements than you
> > consume. That's for performance reasons. But since you will only ever
> > apply side-effect-free functions to seqs, that will make no
> > difference, no?
>
> Sorry, yes, I'm talking about within the code of `pmap'. It creates a
> lazy seq of futures of application of the passed-in function to the
> passed-in collection via (map #(future (f %)) coll). Realizing elements
> of *that* seq has the side-effect of allocating/spawning a thread from
> the futures thread-pool. If `coll' can be turned into a chunked seq,
> then the futures will be realized -- and threads allocated or spawned --
> in chunks of 32. If `coll' cannot be turned into chunked seq, then only
> (+ 2 #CPUS) threads will be allocated/spawned at a time.
>
> I think clarifying that has convinced me that this is definitely bug,
> just because the side-effects are inconsistent. I don't think that the
> chunkability (chunkiness?) of the collection argument should affect the
> degree of parallelism.
>
> -Marshall
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en