Reducing non-Clojure maps may not behave as expected

Mike Rodriguez Wed, 21 Dec 2016 05:24:45 -0800

I found an issue with Clojure's behavior on iterators that somewhat relates 
to what was discussed the comment thread of 
http://dev.clojure.org/jira/browse/CLJ-1738.  I'm posting it here to raise 
awareness and to see if anyone thinks it is a legitimate concern or 
"behaving as expected".


Fortunately, this issue hasn't came up too often, but Java 6's 
implementation of java.util.IdentityHashMap can be used to demonstrate the 
situation.  Note, the implementation of java.util.IdentityHashMap changed 
in Java 7+ so to see it, you'd have to use the Java 6 version.  My concern 
is more general than this specific occurrence though.

-------------------------

(def idm (java.util.IdentityHashMap. {:a 1 :b 2}))

(seq idm)
;;= (#object[java.util.IdentityHashMap$EntryIterator 0x70530bd9 ":b=2"] 
#object[java.util.IdentityHashMap$EntryIterator 0x70530bd9 ":b=2"])

(into {} idm)
;;= {:b 2}

;; Really conceptually the same as `into` above, just explicitly expressed 
for clarity.
(reduce conj {} idm)
;;= {:b 2}

;; Use transducers to try to pull key val out immediately

(into {} (map (juxt key val)) idm)
;;= {:b 2}

(sequence (map (juxt key val)) idm)
;;= ([:a 1] [:b 2])

(eduction (map (juxt key val)) idm)
;;= ([:a 1] [:b 2])

-------------------------

The issue is that the class IdentityHashMap$EntryIterator is used to 
iterate the entry set of the map.  It can be seen at
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/IdentityHashMap.java#IdentityHashMap.EntryIterator

This class is strange in that it plays both the role of java.util.Iterator 
and java.util.Map$Entry at the same time.  It mutates itself in place as a 
Map$Entry as it iterates.
This seems to be a sort of crazy implementation scenario, but it doesn't 
seem to violate any properties of Iterator or Iterable's contracts.

So above:
* `seq` is fairly obviously going to have a problem with anything that 
mutates in place
* `into` is a surprise to me.  `into` is based on `reduce` which I would 
expect to be inheritently 1-item at a time and therefore avoiding the issue.
* `reduce` just confirms there is an issue with the reducing that underlies 
`into` above

Going to transducers:
* `into` with transducers doesn't help anything - not surprisingly
* `sequence` ends up working out.  `sequence` internally used a chunked 
iterator over a clojure.lang.TransformerIterator.  The TransformerIterator 
gets too apply the transformation prior to the iterator moves onto the next 
item, so `key` + `val` is called "soon enough" to keep us safe.
* `eduction` works out for the same reason as `sequence`, but there is no 
chunking to be concerned with.

I think most of this is as I expect, the one that bothers me is that 
`reduce` is unable to traverse this IdentityHashMap.  I'd typically think 
of `reduce`, and anything based on it to be "safe" in terms of being sure 
to fully access 1-item at a time, regardless of if the underlying iterator 
is doing in place mutation.

Also, since `reduce` doesn't work like I'd expect here, it makes me 
question if I could rely on the fact that `sequence` and `eduction` do 
happen to work out right now.

Digging into `reduce`, the issue seems to be fairly clear.  A 
IdentityHashMap does not extend Iterator, Iterable, clojure.lang.IReduce, 
or clojure.lang.IReduceInit.  This is true for most non-Clojure Map's.
`reduce` is based on protocols.  The first is 
`clojure.core.protocols/CollReduce` with the 
`clojure.core.protocols/coll-reduce` function.  A non-Iterable Map falls 
into the default Object implementation of `coll-reduce`.  In turn this goes 
to `clojure.core.protocols/seq-reduce`.  `seq-reduce` calls `seq` on the 
collection.

In context of this example, this means we end up having `reduce` on an 
IdentityHashMap call through to `seq` before reducing.  As seen and 
expected above, `seq` isn't going to work out with this mutate-in-place 
sort of iterator.

---------------

I bring this up here because it has snuck up on me a few times.  I haven't 
found many java.util.Map impl's in the wild that cause this trouble beyond 
java.util.IdentityHashMap.  However, it worries me that I need to check 
each impl I may use if I'm going to trust anything that `reduce`s on it.  I 
tend to just completely avoid it with Java iterop now due to not knowing 
what Clojure's guarantees are.  So I just use an explicit `loop` + `recur` 
that manually traverses the entry set via its iterator one at a time for 
these sorts of maps.

I'm curious to see others' thoughts on this issue.  One thought I had was 
that `clojure.core.protocols/CollReduce` could provide an explicit 
implementation of `coll-reduce` for java.util.Map that (recursively) called 
`coll-reduce` on it's Iterable entry set.  This would avoid this particular 
issue it looks like at least.

e.g.

(extend-protocol clojure.core.protocols/CollReduce
  java.util.Map
  (coll-reduce
    ([coll f] (clojure.core.protocols/coll-reduce (.entrySet coll) f))
    ([coll f val] (clojure.core.protocols/coll-reduce (.entrySet coll) f 
val))))

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reducing non-Clojure maps may not behave as expected

Reply via email to