I found an issue with Clojure's behavior on iterators that somewhat relates
to what was discussed the comment thread of
http://dev.clojure.org/jira/browse/CLJ-1738. I'm posting it here to raise
awareness and to see if anyone thinks it is a legitimate concern or
"behaving as expected".
Fortunately, this issue hasn't came up too often, but Java 6's
implementation of java.util.IdentityHashMap can be used to demonstrate the
situation. Note, the implementation of java.util.IdentityHashMap changed
in Java 7+ so to see it, you'd have to use the Java 6 version. My concern
is more general than this specific occurrence though.
-------------------------
(def idm (java.util.IdentityHashMap. {:a 1 :b 2}))
(seq idm)
;;= (#object[java.util.IdentityHashMap$EntryIterator 0x70530bd9 ":b=2"]
#object[java.util.IdentityHashMap$EntryIterator 0x70530bd9 ":b=2"])
(into {} idm)
;;= {:b 2}
;; Really conceptually the same as `into` above, just explicitly expressed
for clarity.
(reduce conj {} idm)
;;= {:b 2}
;; Use transducers to try to pull key val out immediately
(into {} (map (juxt key val)) idm)
;;= {:b 2}
(sequence (map (juxt key val)) idm)
;;= ([:a 1] [:b 2])
(eduction (map (juxt key val)) idm)
;;= ([:a 1] [:b 2])
-------------------------
The issue is that the class IdentityHashMap$EntryIterator is used to
iterate the entry set of the map. It can be seen at
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/IdentityHashMap.java#IdentityHashMap.EntryIterator
This class is strange in that it plays both the role of java.util.Iterator
and java.util.Map$Entry at the same time. It mutates itself in place as a
Map$Entry as it iterates.
This seems to be a sort of crazy implementation scenario, but it doesn't
seem to violate any properties of Iterator or Iterable's contracts.
So above:
* `seq` is fairly obviously going to have a problem with anything that
mutates in place
* `into` is a surprise to me. `into` is based on `reduce` which I would
expect to be inheritently 1-item at a time and therefore avoiding the issue.
* `reduce` just confirms there is an issue with the reducing that underlies
`into` above
Going to transducers:
* `into` with transducers doesn't help anything - not surprisingly
* `sequence` ends up working out. `sequence` internally used a chunked
iterator over a clojure.lang.TransformerIterator. The TransformerIterator
gets too apply the transformation prior to the iterator moves onto the next
item, so `key` + `val` is called "soon enough" to keep us safe.
* `eduction` works out for the same reason as `sequence`, but there is no
chunking to be concerned with.
I think most of this is as I expect, the one that bothers me is that
`reduce` is unable to traverse this IdentityHashMap. I'd typically think
of `reduce`, and anything based on it to be "safe" in terms of being sure
to fully access 1-item at a time, regardless of if the underlying iterator
is doing in place mutation.
Also, since `reduce` doesn't work like I'd expect here, it makes me
question if I could rely on the fact that `sequence` and `eduction` do
happen to work out right now.
Digging into `reduce`, the issue seems to be fairly clear. A
IdentityHashMap does not extend Iterator, Iterable, clojure.lang.IReduce,
or clojure.lang.IReduceInit. This is true for most non-Clojure Map's.
`reduce` is based on protocols. The first is
`clojure.core.protocols/CollReduce` with the
`clojure.core.protocols/coll-reduce` function. A non-Iterable Map falls
into the default Object implementation of `coll-reduce`. In turn this goes
to `clojure.core.protocols/seq-reduce`. `seq-reduce` calls `seq` on the
collection.
In context of this example, this means we end up having `reduce` on an
IdentityHashMap call through to `seq` before reducing. As seen and
expected above, `seq` isn't going to work out with this mutate-in-place
sort of iterator.
---------------
I bring this up here because it has snuck up on me a few times. I haven't
found many java.util.Map impl's in the wild that cause this trouble beyond
java.util.IdentityHashMap. However, it worries me that I need to check
each impl I may use if I'm going to trust anything that `reduce`s on it. I
tend to just completely avoid it with Java iterop now due to not knowing
what Clojure's guarantees are. So I just use an explicit `loop` + `recur`
that manually traverses the entry set via its iterator one at a time for
these sorts of maps.
I'm curious to see others' thoughts on this issue. One thought I had was
that `clojure.core.protocols/CollReduce` could provide an explicit
implementation of `coll-reduce` for java.util.Map that (recursively) called
`coll-reduce` on it's Iterable entry set. This would avoid this particular
issue it looks like at least.
e.g.
(extend-protocol clojure.core.protocols/CollReduce
java.util.Map
(coll-reduce
([coll f] (clojure.core.protocols/coll-reduce (.entrySet coll) f))
([coll f val] (clojure.core.protocols/coll-reduce (.entrySet coll) f
val))))
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.