Re: Style, Efficiency, and updating nested structure

Erik Assum Thu, 12 Nov 2015 06:50:12 -0800

I would like to argue that your code has to be pretty inefficient if it's 
creating a bigger performance impact than fetching from the database.


Erik. 
-- 
i farta

> Den 12. nov. 2015 kl. 15.36 skrev Dave Tenny <dave.te...@gmail.com>:
> 
> Thanks for the replies, as always lots of interesting ways to do it in 
> clojure.  No zipper sugestions?
> 
> For my own take, I happen to like the last suggestion (Gareth's) the most I 
> think, of the ones offered so far.  It has a lispy elegance and is still 
> relatively efficient in traversals and memory allocation, compared to some 
> solutions (particular with a revision to the merge-roles implementation to 
> eliminate all the set creation, see next sentence).
> 
> Some of you rewrote the problem a bit.  For example, the 'merge-roles' 
> conversion to sets is very expensive, especially since I know that the input 
> in my original problem isn't going to offer duplicate project rules for a 
> given project ID.
> 
> On efficiency in general, it is perhaps a generational thing that people say 
> "don't prematurely optimize" so often and then cons up intermediate objects 
> with wild abandon.  And while a well crafted stop and copy garbage collector 
> can be have negligable GC time given sufficient memory free space, 
> allocations are still not free even in the best case, not to mention the time 
> to build the object being allocated (like hashing/comparing all the values in 
> a set).
> 
> I'd like to suggest that there is a difference between premature optimization 
> and avoiding flagrantly inefficient code.
> If you code so that you routinely copy sequences and nested data structures 
> without a care, then in a serious/large system
> your performance will have death by a thousand cuts.  The big O matters, at 
> least in some applications.
> 
> As indicated in this problem, the input is coming from a database, i.e. a 
> potentially large record source. My bias is that per-record processing is 
> always best kept to a minimum.  So inverting maps, copying sequences, 
> creating sets and such _for every record_ is just a big "nope" for me.    And 
> that doesn't even consider what went into the retrieval of the records from 
> clojure.java.jdbc/query, where every record comes back as a map unless you do 
> something to prevent it.
> 
> Meanwhile there were good suggestions for both abstraction and efficiency 
> here, thanks.  I often forget about lazy-seq and have a tendency to lean 
> toward loop/recur, probably from too many years using Common Lisp.  And of 
> course elimination of the mutable change detection hack is good too.    So 
> nice to see the different ways people tackle the problem.
> 
> 
>> On Wed, Nov 11, 2015 at 10:02 PM, Gareth Clapperton <gclapper...@live.com> 
>> wrote:
>> Hey Daves
>> 
>> I too like the map idea (and using sets for roles for that matter). But 
>> sticking with collections, I might do something like this
>> 
>> (defn upsert 
>>   "Updates or inserts v into coll"
>>   [key-fn update-fn v coll]
>>   (let [k    (key-fn v)]          
>>     (letfn [(step [v [x & xs]]
>>               (lazy-seq
>>                 (cond
>>                   (nil? x)                         (when v [v])
>>                   (and (some? v) (= (key-fn x) k)) (cons (update-fn x v) 
>> (step nil xs))
>>                   :default                         (cons        x        
>> (step  v  xs)))))]
>>       (step v coll))))
>>   
>> (defn merge-roles [p1 {:keys [project_roles] :as p2}]
>>   (update-in p1 [:project_roles] #(vec (clojure.set/union (set %) (set 
>> project_roles)))))
>> 
>> (upsert :project_id merge-roles prj-role prj-roles)
>> 
>> Gareth
>> 
>> 
>>> On Wednesday, November 11, 2015 at 4:25:51 PM UTC-5, Dave Tenny wrote:
>>> A colleague and I are debating various things clojure as we were exploring 
>>> alternative ways to solve a problem.
>>> 
>>> Here's the description of the problem that a particular function is trying 
>>> to solve, and the first implementation of it.
>>> 
>>> (defn update-roles 
>>>   "Given a vector of maps of the form {:project_id N :project_name S 
>>> :project_roles [...roles...]}
>>>   if there is already a map for the indicated project id/name, add new-role 
>>> to it and returned
>>>   a copy the updated input vector, otherwise return a vector with a new map 
>>> entry for the newly found
>>>   project and initial role.  This function is basically aggregating tuples 
>>> from the database."
>>>   [projects project-id project-name new-role]
>>>   (let [updated? (atom nil)
>>> 
>>>         projects* (mapv (fn [m] 
>>>                           (if (= (:project_id m) project-id)
>>>                             (do (reset! updated? true)
>>>                                 (assoc m :project_roles (conj 
>>> (:project_roles m) new-role)))
>>>                             m))
>>>                         projects)]
>>>     (if @updated?
>>>       projects*
>>>       (conj projects {:project_id project-id :project_name project-name 
>>> :project_roles [new-role]}))))
>>> 
>>> 
>>> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles 
>>> [:own]}] 2 "Two" :edit)
>>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
>>> {:project_id 2, :project_name "Two", :project_roles [:edit]}]
>>> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles 
>>> [:own]}] 1 "Two" :edit)
>>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]}]
>>> 
>>> 
>>> 
>>> Now here's another implementation:
>>> 
>>> (defn update-or-insert-project-role
>>>   [prj-roles prj-role]
>>>   (let [to-insert-prj-id (:project_id prj-role)
>>>         by-pid           (group-by :project_id prj-roles)]
>>>     (case (get by-pid to-insert-prj-id)
>>>       nil (conj prj-roles prj-role)
>>>       (->> (update-in by-pid [to-insert-prj-id 0 :project_roles] #(apply 
>>> conj % (:project_roles prj-role)))
>>>            (mapcat second)
>>>            (into [])))))
>>> 
>>> ;; (def prj-roles [{:project_id 1, :project_name "One", :project_roles 
>>> [:own]} {:project_id 3 :project_name "Three" :project_roles [:edit]}])
>>> ;; (update-or-insert-project-role prj-roles {:project_id 2 :project_name 
>>> "Two" :project_roles [:edit]})
>>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
>>> {:project_id 3, :project_name "Three", :project_roles [:edit]} {:project_id 
>>> 2, :project_name "Two", :project_roles [:edit]}]
>>> ;; (update-or-insert-project-role prj-roles {:project_id 1 :project_name 
>>> "One" :project_roles [:edit]})
>>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]} 
>>> {:project_id 3, :project_name "Three", :project_roles [:edit]}]
>>> 
>>> 
>>> 
>>> The function is called in a loop to aggregate rows from a database, though 
>>> it isn't an overriding concern, we're not going millions of records in this 
>>> case.
>>> 
>>> The first thing my colleague and I disagreed on was the calling sequence, 
>>> arguing over which is more readable.
>>> The second thing was whether efficiency in this context is really 
>>> important, or whether it's all moot in clojure.  
>>> 
>>> Finally, I'm sure there's a better way, probably with Zippers or something, 
>>> but neither of us have used them. Suggestions for the stylistic and 
>>> operational epitome of clojure expression on this routine are welcome!
>>> 
>>> Superficially, and probably incorrect in multiple ways, here is a poor 
>>> attempt at breaking down efficiency in terms of search/traversal and memory 
>>> allocations.  This was done by someone with no knowledge of clojure 
>>> internals (including the library implementations of the called functions).
>>> 
>>> ;; Comparing the two routines per function call, for existing project case 
>>> (i.e. where roles are updated)
>>> ;;
>>> ;; Assuming each routine allocates new vector for new-role placement in 
>>> existing project
>>> ;; and MapEntry for assoc of new vector on project_roles, so they aren't 
>>> included in allocations 
>>> ;; below since both routines have to do it.
>>> ;;
>>> ;; Note that x-element map allocates storage for map and map-entries or 
>>> clojure equivalent.
>>> ;; (and more expensive than an x-element vector, of course).
>>> ;;
>>> ;; n == length of input project list.
>>> ;; m == average length of input project list role vectors.
>>> ;; 
>>> ;; Object Allocations
>>> ;;   Function call:
>>> ;;     update-roles: 
>>> ;;       1 atom
>>> ;;       1 O(n) vector for mapv 
>>> ;;     update-or-insert-project-role: 
>>> ;;       1 3-entry map + 1 single-element vector for prj-role argument 
>>> input.
>>> ;;       1 n-element map for group-by
>>> ;;       n vectors for group-by map values
>>> ;;       1 n-element map for update-in
>>> ;;       1 list/sequence for mapcat (+ n concat intermediaries?)
>>> ;;       1 vector for into
>>> ;;
>>> ;; If we discard the second 'into' and first 'mapv' allocations the 
>>> update-or-insert-project-role routine allocates
>>> ;; 3 additional maps (two of which are O(n)), n additional vectors, and 1 
>>> additional list/sequence.
>>> ;;   
>>> ;; Searches/traversals/copies
>>> ;;  update-roles: 
>>> ;;       O(n) - mapv
>>> ;;  update-or-insert-project-role: 
>>> ;;       O(n) - group-by
>>> ;;       O(n) - update-in
>>> ;;       O(n) - mapcat
>>> ;;       O(n) - into
>>> ;;
>>> ;; Here's what update-or-insert-project-role allocates (by way of 
>>> assistance in assessing the above)
>>> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own]}], 3 
>>> [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}           
>>>   -- group-by
>>> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own :edit]}], 3 
>>> [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}       -- 
>>> update-in
>>> ;;({:project_id 1, :project_name "One", :project_roles [:own :edit]} 
>>> {:project_id 3, :project_name "Three", :project_roles [:edit]})             
>>>    -- mapcat
>>> ;;[{:project_id 1, :project_name "One", :project_roles [:own :edit]} 
>>> {:project_id 3, :project_name "Three", :project_roles [:edit]}]             
>>>    -- into
>>> 
>>> Thanks for your thoughts and suggestions. 
>> 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Clojure" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/clojure/nhzH_xl8IlE/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Style, Efficiency, and updating nested structure

Reply via email to