Re: Working with a huge graph - how can I make Clojure performant?

Niels van Klaveren Thu, 28 Mar 2013 15:18:52 -0700

That's quoting far out of context Alan. All Christophe says in his blog is 
he dislikes the statefulness of most implementations of Tarjan, and shows 
how this isn't needed, and can be done in a functional way.


You could have stated the arguments why you think your version is superior, 
and it might very well be, but don't picture it like Christophe said he 
doesn't like his own implementation.

On Thursday, March 28, 2013 7:38:41 PM UTC+1, Alan Malloy wrote:
>
> Have you looked at https://github.com/jordanlewis/data.union-find ? 
> Personally, I'd prefer it to Christophe's implementation, since his blog 
> post seems to start with "I dislike this algorithm"; I also helped out a 
> bit in writing this version.
>
> On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote:
>>
>> Hey,
>>
>>
>> I got an assignment to implement an algorithm to calculate strongly 
>> connected components in a graph (
>> http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
>> big, it has ~900.000 vertices.
>>
>>
>> In its first pass, it needs to do a depth-first search on the graph and 
>> calculate the finishing time for each node (the finishing time for a node 
>> is a number from 0…V-1 where V is the number of vertices). Potentially 
>> several depth-first search need to be launched (done in the finishing-times 
>> function below) to discover all components.
>>
>>
>> The version I pasted below is the most performant. It discovers ~600.000 
>> vertices (2/3 of all vertices). However, on subsequent 
>> dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
>> additional nodes/second) and I'm not sure why.
>>
>>
>> Here are a few things I tried to speed up the algorithm:
>>
>>
>> * rewrite to a recursive function (not tail-recursive) and use lazy-seqs
>>
>> * define dfs-by-finishing-times in finishing-times so that the whole 
>> graph (G) does not have to be passed at each call.
>>
>> * use persistent data structures everywhere instead of transients (I know 
>> this would only slow things down but it did not hurt to try)
>>
>> * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
>> very proficient with profilers but I could not extract any valuable 
>> information
>>
>>
>> Here are the code snippets in question:
>>
>> (I also pasted it here: 
>> https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
>> readability)
>>
>> (defn dfs-by-finishing-times
>>
>>   ([G u]
>>
>>      (dfs-by-finishing-times G u #{}))
>>
>>   ([G u explored]
>>
>>      (loop [[v & vs :as stack] (list u), explored (transient explored), 
>> lhalf [], rhalf [],  iter-cnt 0]
>>
>>          (if (seq stack)
>>
>>           (let [neighbors (persistent!
>>
>>                            (reduce
>>
>>                             (fn [c u] (if (explored u) c (conj! c u)))
>>
>>                             (transient [])
>>
>>                             (G v)))]
>>
>>             (cond
>>
>>              (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))
>>
>>              (empty? neighbors) (recur vs (conj! explored v) (conj lhalf 
>> v) rhalf (inc iter-cnt))
>>
>>              :else (recur (reduce (fn [stack e] (cons e stack)) vs 
>> neighbors)
>>
>>                        (conj! explored v)
>>
>>                        lhalf
>>
>>                        (cons v rhalf)
>>
>>                        (inc iter-cnt))))
>>
>>           (concat lhalf rhalf)))
>>
>>      ))
>>
>>
>> (defn finishing-times [G vertices]
>>
>>   "The first pass of Kosaraju's algorithm.
>>
>>    Scan the transpose graph of G, and mark the finishing time for each.
>>
>>    G should already be the transposed graph"
>>
>>   (loop [[u & vs :as stack] (seq vertices)
>>
>>           explored #{},
>>
>>           finished []]
>>
>>      (if (nil? u)
>>
>>        finished
>>
>>        (let [path (dfs-by-finishing-times G u explored)
>>
>>              new-explored (into explored path)]
>>
>>          (recur (remove new-explored vs)
>>
>>                 new-explored
>>
>>                 (into finished path))))))
>>
>> Do you have any insights into what technique I could use to speed up my 
>> algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
>> what. Presumably the whole algorithm runs under 10 seconds in C# on the 
>> same graph so this is rather embarrassing :)
>>
>> Appreciate your help,
>>
>> Balint
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Working with a huge graph - how can I make Clojure performant?

Reply via email to