Most interesting ... a few observations:

   1. It is interesting to see how much work is needed to populate the
   third output of diff (i.e. the data common to the two inputs)...I'm
   wondering how often it's the case that callers really only care about the
   differences and not the similarities?  Does it make sense to have a flavor
   of data/diff that explicitly only does the first two calculations?
   2. I'm been thinking more and I think what would be really nifty would
   be if the Red/Black tree in
   
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentTreeMap.java
   also were an instance of a Merkle tree, so one could compare the hashes for
   nodes high in two similar TreeMaps and thereby tell that the entire subtree
   under those pairs of nodes were identical and therefore could be skipped
   during the comparison.  But that's certainly a nontrivial amount of work
   and the overhead probably would be excessive :-(

But thanks so much for showing what transients can do...I have lots to
learn!

Thanks again!

Marshall

On Fri, Sep 9, 2016 at 1:10 PM, Francis Avila <fav...@breezeehr.com> wrote:

> The only way to have diff consider internal structures is to have a diff
> protocol (with access to internals) to speed up comparison between items of
> the exact same Java class.
>
> However, much of the slowdown you see is from intermediate vectors created
> during the reductions in diff-associative and diff-associative-key (two
> private functions in clojure.data)
>
> This is what Clojure 1.8 has now:
>
> (defn- diff-associative
>   "Diff associative things a and b, comparing only keys in ks."
>   [a b ks]
>   (reduce
>    (fn [diff1 diff2]
>      (doall (map merge diff1 diff2)))
>    [nil nil nil]
>    (map
>     (partial diff-associative-key a b)
>     ks)))
>
> (defn- diff-associative-key
>   "Diff associative things a and b, comparing only the key k."
>   [a b k]
>   (let [va (get a k)
>         vb (get b k)
>         [a* b* ab] (diff va vb)
>         in-a (contains? a k)
>         in-b (contains? b k)
>         same (and in-a in-b
>                   (or (not (nil? ab))
>                       (and (nil? va) (nil? vb))))]
>     [(when (and in-a (or (not (nil? a*)) (not same))) {k a*})
>      (when (and in-b (or (not (nil? b*)) (not same))) {k b*})
>      (when same {k ab})
>      ]))
>
>
>
> This is my baseline:
>
> (defn speedtest []
>   (let [doDiff (fn [a b] (time
>                            (let [delta (clojure.data/diff a b)]
>                              [(first delta) (second delta)])))
>         x      (into {} (map vec (partition 2 (range 100000))))
>         x1     (into {} (map vec (partition 2 (range 100000))))
>         y      (assoc x -1 -1)]
>     [(doDiff x x)
>      (doDiff x x1)
>      (doDiff x y)]))
> => #'clojure.data/speedtest
> (speedtest)
> "Elapsed time: 0.131537 msecs"
> "Elapsed time: 31.887744 msecs"
> "Elapsed time: 247.431151 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
> (speedtest)
> "Elapsed time: 0.006951 msecs"
> "Elapsed time: 26.47309 msecs"
> "Elapsed time: 181.248303 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
> (speedtest)
> "Elapsed time: 0.007175 msecs"
> "Elapsed time: 23.102316 msecs"
> "Elapsed time: 174.300056 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
>
>
> Now lets replace diff-associative and diff-associative-key with mutable
> objects and transients to contain the reduction:
>
> (let [NF (Object.)] ;; not-found sentinel to avoid a contains? lookup per
> key
>   (defn- diff-associative-key
>     "Diff associative things a and b, comparing only the key k."
>     [a b k]
>     (let [va   (get a k NF)
>           vb   (get b k NF)
>           in-a (not (identical? va NF))
>           in-b (not (identical? vb NF))
>           [a* b* ab] (diff (when in-a va) (when in-b vb))
>           same (and in-a in-b
>                  (or (not (nil? ab))
>                    (and (nil? va) (nil? vb))))]
>       (doto (object-array 3)
>         (aset 0 (when (and in-a (or (not (nil? a*)) (not same))) {k a*}))
>         (aset 1 (when (and in-b (or (not (nil? b*)) (not same))) {k b*}))
>         (aset 2 (when same {k ab}))
>         ))))
>
> (defn- diff-associative
>   "Diff associative things a and b, comparing only keys in ks."
>   [a b ks]
>   (transduce
>     (map (partial diff-associative-key a b))
>     (fn
>       ([] (doto (object-array 3)
>             (aset 0 (transient {}))
>             (aset 1 (transient {}))
>             (aset 2 (transient {}))))
>       ([^objects diff]
>        [(not-empty (persistent! (aget diff 0)))
>         (not-empty (persistent! (aget diff 1)))
>         (not-empty (persistent! (aget diff 2)))])
>       ([^objects diff1 ^objects diff2] ;; These type hints are important!
>        (doto diff1
>          (aset 0 (conj! (aget diff1 0) (aget diff2 0)))
>          (aset 1 (conj! (aget diff1 1) (aget diff2 1)))
>          (aset 2 (conj! (aget diff1 2) (aget diff2 2))))))
>     ks))
>
> Such mutation, ugh. But, new results (after re-evaling the extend-type):
>
>
> (speedtest)
> "Elapsed time: 0.006056 msecs"
> "Elapsed time: 32.802827 msecs"
> "Elapsed time: 62.53305 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
> (speedtest)
> "Elapsed time: 0.006661 msecs"
> "Elapsed time: 26.700653 msecs"
> "Elapsed time: 70.025767 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
> (speedtest)
> "Elapsed time: 0.006686 msecs"
> "Elapsed time: 26.426425 msecs"
> "Elapsed time: 70.042115 msecs"
> => [[nil nil] [nil nil] [nil {-1 -1}]]
>
>
> Not bad.
>
> On Wednesday, September 7, 2016 at 9:16:31 PM UTC-5, Marshall handheld
> Flax wrote:
>>
>> Suppose I create a decently-sized persistent map, and then make a small
>> change.  Clojure is extremely efficient at creating a second persistent map
>> that shares most of its internal data with the first map, but
>> clojure.data/diff appears to be unable to take advantage of that shared
>> internal data -- instead it seems to just exhaustively compare the two
>> trees.
>>
>> Comparing a map with itself: "Elapsed time: 0.155404 msecs"
>> Comparing two identical -- but separately-created maps -- "Elapsed time:
>> 43.687906 msecs"
>> Comparing a map and one with a small change thereof: "Elapsed time:
>> 378.754266 msecs"
>>
>> Is there a way of comparing two maps that takes advantage of the
>> internals of the commonality between maps "x" and "y" below?
>>
>> Many thanks!!!
>>
>> Marshall
>>
>> (let [doDiff (fn[a b] (time
>>                        (let [delta (clojure.data/diff a b)]
>>                          [(first delta) (second delta)])))
>>       x      (into {} (map vec (partition 2 (range 100000))))
>>       x1     (into {} (map vec (partition 2 (range 100000))))
>>       y      (assoc x -1 -1)]
>>   [(doDiff x x)
>>    (doDiff x x1)
>>    (doDiff x y)])
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to