The only way to have diff consider internal structures is to have a diff 
protocol (with access to internals) to speed up comparison between items of 
the exact same Java class.

However, much of the slowdown you see is from intermediate vectors created 
during the reductions in diff-associative and diff-associative-key (two 
private functions in clojure.data)

This is what Clojure 1.8 has now:

(defn- diff-associative
  "Diff associative things a and b, comparing only keys in ks."
  [a b ks]
  (reduce
   (fn [diff1 diff2]
     (doall (map merge diff1 diff2)))
   [nil nil nil]
   (map
    (partial diff-associative-key a b)
    ks)))

(defn- diff-associative-key
  "Diff associative things a and b, comparing only the key k."
  [a b k]
  (let [va (get a k)
        vb (get b k)
        [a* b* ab] (diff va vb)
        in-a (contains? a k)
        in-b (contains? b k)
        same (and in-a in-b
                  (or (not (nil? ab))
                      (and (nil? va) (nil? vb))))]
    [(when (and in-a (or (not (nil? a*)) (not same))) {k a*})
     (when (and in-b (or (not (nil? b*)) (not same))) {k b*})
     (when same {k ab})
     ]))



This is my baseline:

(defn speedtest []
  (let [doDiff (fn [a b] (time
                           (let [delta (clojure.data/diff a b)]
                             [(first delta) (second delta)])))
        x      (into {} (map vec (partition 2 (range 100000))))
        x1     (into {} (map vec (partition 2 (range 100000))))
        y      (assoc x -1 -1)]
    [(doDiff x x)
     (doDiff x x1)
     (doDiff x y)]))
=> #'clojure.data/speedtest
(speedtest)
"Elapsed time: 0.131537 msecs"
"Elapsed time: 31.887744 msecs"
"Elapsed time: 247.431151 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]
(speedtest)
"Elapsed time: 0.006951 msecs"
"Elapsed time: 26.47309 msecs"
"Elapsed time: 181.248303 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]
(speedtest)
"Elapsed time: 0.007175 msecs"
"Elapsed time: 23.102316 msecs"
"Elapsed time: 174.300056 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]


Now lets replace diff-associative and diff-associative-key with mutable 
objects and transients to contain the reduction:

(let [NF (Object.)] ;; not-found sentinel to avoid a contains? lookup per 
key
  (defn- diff-associative-key
    "Diff associative things a and b, comparing only the key k."
    [a b k]
    (let [va   (get a k NF)
          vb   (get b k NF)
          in-a (not (identical? va NF))
          in-b (not (identical? vb NF))
          [a* b* ab] (diff (when in-a va) (when in-b vb))
          same (and in-a in-b
                 (or (not (nil? ab))
                   (and (nil? va) (nil? vb))))]
      (doto (object-array 3)
        (aset 0 (when (and in-a (or (not (nil? a*)) (not same))) {k a*}))
        (aset 1 (when (and in-b (or (not (nil? b*)) (not same))) {k b*}))
        (aset 2 (when same {k ab}))
        ))))

(defn- diff-associative
  "Diff associative things a and b, comparing only keys in ks."
  [a b ks]
  (transduce
    (map (partial diff-associative-key a b))
    (fn
      ([] (doto (object-array 3)
            (aset 0 (transient {}))
            (aset 1 (transient {}))
            (aset 2 (transient {}))))
      ([^objects diff]
       [(not-empty (persistent! (aget diff 0)))
        (not-empty (persistent! (aget diff 1)))
        (not-empty (persistent! (aget diff 2)))])
      ([^objects diff1 ^objects diff2] ;; These type hints are important!
       (doto diff1
         (aset 0 (conj! (aget diff1 0) (aget diff2 0)))
         (aset 1 (conj! (aget diff1 1) (aget diff2 1)))
         (aset 2 (conj! (aget diff1 2) (aget diff2 2))))))
    ks))

Such mutation, ugh. But, new results (after re-evaling the extend-type):

 
(speedtest)
"Elapsed time: 0.006056 msecs"
"Elapsed time: 32.802827 msecs"
"Elapsed time: 62.53305 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]
(speedtest)
"Elapsed time: 0.006661 msecs"
"Elapsed time: 26.700653 msecs"
"Elapsed time: 70.025767 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]
(speedtest)
"Elapsed time: 0.006686 msecs"
"Elapsed time: 26.426425 msecs"
"Elapsed time: 70.042115 msecs"
=> [[nil nil] [nil nil] [nil {-1 -1}]]


Not bad.

On Wednesday, September 7, 2016 at 9:16:31 PM UTC-5, Marshall handheld Flax 
wrote:
>
> Suppose I create a decently-sized persistent map, and then make a small 
> change.  Clojure is extremely efficient at creating a second persistent map 
> that shares most of its internal data with the first map, but 
> clojure.data/diff appears to be unable to take advantage of that shared 
> internal data -- instead it seems to just exhaustively compare the two 
> trees.  
>
> Comparing a map with itself: "Elapsed time: 0.155404 msecs"
> Comparing two identical -- but separately-created maps -- "Elapsed time: 
> 43.687906 msecs"
> Comparing a map and one with a small change thereof: "Elapsed time: 
> 378.754266 msecs"
>
> Is there a way of comparing two maps that takes advantage of the internals 
> of the commonality between maps "x" and "y" below?
>
> Many thanks!!!
>
> Marshall
>
> (let [doDiff (fn[a b] (time
>                        (let [delta (clojure.data/diff a b)]
>                          [(first delta) (second delta)])))
>       x      (into {} (map vec (partition 2 (range 100000))))
>       x1     (into {} (map vec (partition 2 (range 100000))))
>       y      (assoc x -1 -1)]
>   [(doDiff x x)
>    (doDiff x x1)
>    (doDiff x y)])
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to