Re: Parallel doseq?

Cedric Greevey Thu, 24 May 2012 12:22:38 -0700

Sorry. *Something* is apparently messing with my outbound messages.
I'm not sure what, why, or how. Characters are moved or substituted at
random times, sometimes with unfortunate results.


Meanwhile, I had tried using pcalls as well but was getting spurious
behavior. I wound up with:

(defmacro pdoseq
  "Bindings as for for, but parallel execution as per pmap, pcalls,
pvalues; returns
   nil."
  [seq-exprs & body]
  `(let [procs# (+ 2 (.availableProcessors (Runtime/getRuntime)))
         calls# (for ~seq-exprs (fn [] ~@body))
         threads# (for [i# (range procs#)]
                    (Thread.
                      #(doall
                         (map (fn [x#] (x#))
                           (take-nth procs#
                              (drop i# calls#))))))]
     (doseq [t# threads#]
       (.start t#))
     (doseq [t# threads#]
       (.join t#))))

The body is wrapped in a function so realizing an element of the seq
doesn't immediately execute it. Threads are created that will skip to
every nth element and invoke the function there, each starting offset
from the others. The threads are then all started, and then all joined
so the pdoseq call doesn't return until the job's done, just as plain
doseq has a synchronous return.

One potential issue is that if one thread gets well ahead of the
others a big chunk of the for seq is held onto while the others catch
up. So it's suitable for up to thousands of items that are
individually expensive. If you had millions of individually cheap
items you'd need another strategy. A nested iteration like (doseq [x
s1 y s2] foo) could be split into (pdoseq [x s1] (doseq [y s2] foo))
so that whole inner loops are the granularity of the parallel jobs
(diluting each anonymous function call overhead over a larger portion
of the total work) and the size of the potentially-held-onto seq is
only the size of the outer loop (e.g. the maximum size of held-onto
seq would be 1024 instead of 786432 if you are looping over a 1024x768
array, easily possible with some image manipulation jobs, and there
wouldn't be an added function call overhead once every pixel but only
once every row.)

An interesting question is why something like this isn't already in
the standard library. The supplied parallel functions (pmap, pcalls,
pvalues) don't have for bindings and don't seem to be adaptable to use
them.

Another thing that I thought could be handy is a lazy vector -- each
element is realized only when retrieved. Backed by a vector of delays,
of course. And super-lazy seqs and vectors built on a "weak delay":
something like delay, except that it uses a WeakReference to cache its
value. If it goes away the code that computed it will run again (so it
better not have side effects) if it's needed again. Lazy vectors could
even be backed by a function with a single integer argument -- it
seems unlikely you could define one in another way than as some sort
of mapping from index to value-to-produce anyway. There'd need to be
some kind of tree to hold the cache as well, perhaps one suited to
represent sparse vectors for good performance and low memory use when
few elements were realized. The usual 32^n trees, but with some
branches omitted with nils, might do the job; as it filled in when
elements were realized it would turn into a normal vector as far as
memory use was concerned. In the super-lazy case a (mutable!) map with
weak values (there's one in the Apache Commons) might be preferable to
a tree holding individual weak delays.

How are lazy vectors and super-lazy things relevant to this thread?
Well, the above already implements something like a super-lazy seq in
that a compact closure stands in for each realized item and must be
called repeatedly to produce the true seq element. The only thing
missing is the weak caching. And a super-lazy vector would be
especially easy to use in parallelized manner because it's random
access. The above code that produces offsets i# and stepsizes procs#
would, instead of being fed to (take-nth procs# (drop i# aseq)) to get
functions to invoke, would be fed to (map lvec (range i# (count lvec)
procs#)) to get the actual elements.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Parallel doseq?

Reply via email to