Re: Style, Efficiency, and updating nested structure

Dave Tenny Thu, 12 Nov 2015 06:38:55 -0800

Thanks for the replies, as always lots of interesting ways to do it in
clojure.  No zipper sugestions?


For my own take, I happen to like the last suggestion (Gareth's) the most I
think, of the ones offered so far.  It has a lispy elegance and is still
relatively efficient in traversals and memory allocation, compared to some
solutions (particular with a revision to the merge-roles implementation to
eliminate all the set creation, see next sentence).

Some of you rewrote the problem a bit.  For example, the 'merge-roles'
conversion to sets is very expensive, especially since I know that the
input in my original problem isn't going to offer duplicate project rules
for a given project ID.

On efficiency in general, it is perhaps a generational thing that people
say "don't prematurely optimize" so often and then cons up intermediate
objects with wild abandon.  And while a well crafted stop and copy garbage
collector can be have negligable GC time given sufficient memory free
space, allocations are still not free even in the best case, not to mention
the time to build the object being allocated (like hashing/comparing all
the values in a set).

I'd like to suggest that there is a difference between premature
optimization and avoiding flagrantly inefficient code.
If you code so that you routinely copy sequences and nested data structures
without a care, then in a serious/large system
your performance will have death by a thousand cuts.  The big O matters, at
least in some applications.

As indicated in this problem, the input is coming from a database, i.e. a
potentially large record source. My bias is that per-record processing is
always best kept to a minimum.  So inverting maps, copying sequences,
creating sets and such _*for every record*_ is just a big "nope" for me.
 And that doesn't even consider what went into the retrieval of the records
from clojure.java.jdbc/query, where every record comes back as a map unless
you do something to prevent it.

Meanwhile there were good suggestions for both abstraction and efficiency
here, thanks.  I often forget about lazy-seq and have a tendency to lean
toward loop/recur, probably from too many years using Common Lisp.  And of
course elimination of the mutable change detection hack is good too.    So
nice to see the different ways people tackle the problem.


On Wed, Nov 11, 2015 at 10:02 PM, Gareth Clapperton <gclapper...@live.com>
wrote:

> Hey Daves
>
> I too like the map idea (and using sets for roles for that matter). But
> sticking with collections, I might do something like this
>
> (defn upsert
>   "Updates or inserts v into coll"
>   [key-fn update-fn v coll]
>   (let [k    (key-fn v)]
>     (letfn [(step [v [x & xs]]
>               (lazy-seq
>                 (cond
>                   (nil? x)                         (when v [v])
>                   (and (some? v) (= (key-fn x) k)) (cons (update-fn x v)
> (step nil xs))
>                   :default                         (cons        x
>  (step  v  xs)))))]
>       (step v coll))))
>
> (defn merge-roles [p1 {:keys [project_roles] :as p2}]
>   (update-in p1 [:project_roles] #(vec (clojure.set/union (set %) (set
> project_roles)))))
>
> (upsert :project_id merge-roles prj-role prj-roles)
>
> Gareth
>
>
> On Wednesday, November 11, 2015 at 4:25:51 PM UTC-5, Dave Tenny wrote:
>>
>> A colleague and I are debating various things clojure as we were
>> exploring alternative ways to solve a problem.
>>
>> Here's the description of the problem that a particular function is
>> trying to solve, and the first implementation of it.
>>
>> (defn update-roles
>>   "Given a vector of maps of the form {:project_id N :project_name S
>> :project_roles [...roles...]}
>>   if there is already a map for the indicated project id/name, add
>> new-role to it and returned
>>   a copy the updated input vector, otherwise return a vector with a new
>> map entry for the newly found
>>   project and initial role.  This function is basically aggregating
>> tuples from the database."
>>   [projects project-id project-name new-role]
>>   (let [updated? (atom nil)
>>
>>         projects* (mapv (fn [m]
>>                           (if (= (:project_id m) project-id)
>>                             (do (reset! updated? true)
>>                                 (assoc m :project_roles (conj (:project_roles
>> m) new-role)))
>>                             m))
>>                         projects)]
>>     (if @updated?
>>       projects*
>>       (conj projects {:project_id project-id :project_name project-name 
>> :project_roles
>> [new-role]}))))
>>
>>
>> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles [:own
>> ]}] 2 "Two" :edit)
>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
>> {:project_id
>> 2, :project_name "Two", :project_roles [:edit]}]
>> ;; (update-roles [{:project_id 1 :project_name "One" :project_roles [:own
>> ]}] 1 "Two" :edit)
>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]}]
>>
>>
>>
>> Now here's another implementation:
>>
>> (defn update-or-insert-project-role
>>   [prj-roles prj-role]
>>   (let [to-insert-prj-id (:project_id prj-role)
>>         by-pid           (group-by :project_id prj-roles)]
>>     (case (get by-pid to-insert-prj-id)
>>       nil (conj prj-roles prj-role)
>>       (->> (update-in by-pid [to-insert-prj-id 0 :project_roles] #(apply 
>> conj % (:project_roles prj-role)))
>>            (mapcat second)
>>            (into [])))))
>>
>> ;; (def prj-roles [{:project_id 1, :project_name "One", :project_roles 
>> [:own]} {:project_id 3 :project_name "Three" :project_roles [:edit]}])
>> ;; (update-or-insert-project-role prj-roles {:project_id 2 :project_name 
>> "Two" :project_roles [:edit]})
>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own]} 
>> {:project_id 3, :project_name "Three", :project_roles [:edit]} {:project_id 
>> 2, :project_name "Two", :project_roles [:edit]}]
>> ;; (update-or-insert-project-role prj-roles {:project_id 1 :project_name 
>> "One" :project_roles [:edit]})
>> ;; => [{:project_id 1, :project_name "One", :project_roles [:own :edit]} 
>> {:project_id 3, :project_name "Three", :project_roles [:edit]}]
>>
>>
>>
>>
>> The function is called in a loop to aggregate rows from a database,
>> though it isn't an overriding concern, we're not going millions of records
>> in this case.
>>
>> The first thing my colleague and I disagreed on was the calling sequence,
>> arguing over which is more readable.
>> The second thing was whether efficiency in this context is really
>> important, or whether it's all moot in clojure.
>>
>> Finally, I'm sure there's a better way, probably with Zippers or
>> something, but neither of us have used them. Suggestions for the stylistic
>> and operational epitome of clojure expression on this routine are welcome!
>>
>> Superficially, and probably incorrect in multiple ways, here is a poor
>> attempt at breaking down efficiency in terms of search/traversal and memory
>> allocations.  This was done by someone with no knowledge of clojure
>> internals (including the library implementations of the called functions).
>>
>> ;; Comparing the two routines per function call, for existing project
>> case (i.e. where roles are updated)
>> ;;
>> ;; Assuming each routine allocates new vector for new-role placement in
>> existing project
>> ;; and MapEntry for assoc of new vector on project_roles, so they aren't
>> included in allocations
>> ;; below since both routines have to do it.
>> ;;
>> ;; Note that x-element map allocates storage for map and map-entries or
>> clojure equivalent.
>> ;; (and more expensive than an x-element vector, of course).
>> ;;
>> ;; n == length of input project list.
>> ;; m == average length of input project list role vectors.
>> ;;
>> ;; Object Allocations
>> ;;   Function call:
>> ;;     update-roles:
>> ;;       1 atom
>> ;;       1 O(n) vector for mapv
>> ;;     update-or-insert-project-role:
>> ;;       1 3-entry map + 1 single-element vector for prj-role argument
>> input.
>> ;;       1 n-element map for group-by
>> ;;       n vectors for group-by map values
>> ;;       1 n-element map for update-in
>> ;;       1 list/sequence for mapcat (+ n concat intermediaries?)
>> ;;       1 vector for into
>> ;;
>> ;; If we discard the second 'into' and first 'mapv' allocations the
>> update-or-insert-project-role routine allocates
>> ;; 3 additional maps (two of which are O(n)), n additional vectors, and 1
>> additional list/sequence.
>> ;;
>> ;; Searches/traversals/copies
>> ;;  update-roles:
>> ;;       O(n) - mapv
>> ;;  update-or-insert-project-role:
>> ;;       O(n) - group-by
>> ;;       O(n) - update-in
>> ;;       O(n) - mapcat
>> ;;       O(n) - into
>> ;;
>> ;; Here's what update-or-insert-project-role allocates (by way of
>> assistance in assessing the above)
>> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own]}], 3
>> [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}
>>   -- group-by
>> ;;{1 [{:project_id 1, :project_name "One", :project_roles [:own :edit]}],
>> 3 [{:project_id 3, :project_name "Three", :project_roles [:edit]}]}
>> -- update-in
>> ;;({:project_id 1, :project_name "One", :project_roles [:own :edit]}
>> {:project_id 3, :project_name "Three", :project_roles [:edit]})
>>    -- mapcat
>> ;;[{:project_id 1, :project_name "One", :project_roles [:own :edit]}
>> {:project_id 3, :project_name "Three", :project_roles [:edit]}]
>>    -- into
>>
>> Thanks for your thoughts and suggestions.
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/nhzH_xl8IlE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Style, Efficiency, and updating nested structure

Reply via email to