Re: pool-map - a more disciplined version of pmap on top of executors

Marko Topolnik Tue, 15 Jan 2013 01:25:15 -0800

On Tuesday, January 15, 2013 12:36:04 AM UTC+1, Jim foo.bar wrote:

>  On 14/01/13 22:15, Marko Topolnik wrote:
>  
> But... you were quite clear in stating that your function isn't lazy, and 
> you were right to say that: *doall* will not return until *all* the tasks 
> have completed. Maybe you did want laziness, after all?
>
>
> I'm not entirely sure what you mean here...The only real reason I am using 
> 'doall' is so that I can shutdown the ExecutorService, otherwise I wouldn't 
> mind returning a lazy-seq. In fact that is how I originally had it - I was 
> never shutting down the pool until recently! Inside the doall however I'm 
> polling the pool for answers and *it* looks for finished jobs...I'm not 
> deref-ing the futures in a serial manner which has higher chances of 
> bocking...Is my thinking not correct?  
>


The order in which you are polling is not very relevant given the fact that 
*doall* won't return until *all* futures are realized. It's just an 
internal detail. As for shutting down the executor service, it is safe to 
do so immediatly after all tasks have been submitted:

(defn pool-map [f coll]
  (let [exec (Executors/newFixedThreadPool (.availableProcessors 
(Runtime/getRuntime)))]
    (->> (try (.invokeAll exec (for [x coll] #(f x))) (finally (.shutdown 
exec)))
         (map #(.get ^FutureTask %)))))
 

> Yet another issue with your function is that it doesn't submit all the 
> tasks up front so it clearly won't prevent longer tasks from delaying the 
> shorter ones. But, even if they are submitted all at once, like in my 
> version, the executor service will still enqueue them internally, with the 
> same end result. You'd need an unbounded thread pool, and that would 
> obviously bring in other issues.
>
>
> You're quite right here...Again using 'for' was a remnant from the old 
> version...Perhaps 'mapv' would be a better choice for submitting the 
> jobs...In any case, is this  executor-based approach (with or without the 
> completion service) encouraged? Would you care if tomorrow pmap started 
> using this approach?
>

Using *mapv* (or simply *doall*, same thing) won't solve the root problem 
of tasks waiting in executor's own queue.

Taking into account all that was said, *pool-map* can't offer much more 
than *pmap*. You can't know which tasks will take less time until they are 
already done. It is theoretically impossible to pre-order them according to 
execution time, thereby harvesting the results of the fastest ones earlier, 
eventually promoting total concurrency. The way to go is grinding the tasks 
down to chunks of predictable runtime and submitting them to a fork/join 
solver, which will optimize concurrency through work-stealing.
 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: pool-map - a more disciplined version of pmap on top of executors

Reply via email to