On 5/6/20 5:05 PM, Alan Snyder wrote:
Let me clarify. I was referring to the non-support by the framework, which restricts membership semantics:
...[specification text regarding equals()]...

OK, thanks for the clarification.

Although the framework provides implementation classes that MAY be used to create nonconforming collections, these classes are called out as potentially non-conforming. For example:

[IdentityHashMap]

*This class is /not/ a general-purpose |Map| implementation! While this class implements the |Map| interface, it intentionally violates |Map's| general contract, which mandates the use of the |equals| method when comparing objects. This class is designed for use only in the rare cases wherein reference-equality semantics are required.*

And in many places, the implications of non-conformance are also called out:

*While the object returned by this method implements the |Collection| interface, it does /not/ obey |Collection's| general contract. Like its backing map, the collection returned by this method defines element equality as reference-equality rather than object-equality. This affects the behavior of its |contains|, |remove| and |containsAll| methods.*

The phrase “affects the behavior” is open ended. It means all bets are off.

No, all bets are not off.

"All bets are off" isn't an unreasonable interpretation of the current wording of the specification, but in fact the specification is incredibly poorly worded and confusing. Plus there are several out-and-out bugs in the spec.

It's also somewhat odd to say that some collection implementations are conforming whereas others aren't. Implementations can have bugs that render them non-conformant to a specification. In this case, though, parts of the specification contradict each other. You could pick part A of the spec and say that part B is "non-conformant" but that really doesn't make sense if you consider the totality of the specification. The fact is that the spec as a whole is self-inconsistent. I intend to fix that.

My point is that (fully) supporting custom membership semantics changes a fundamental property of the original design and should be approached as a redesign. It is not a matter of fixing a series of issues one at a time, as you discover them. Calling these issues semantic bugs when there is no violation of the specification is leading you down the wrong path, in my opinion.

The issue of membership semantics is part of the original design.

If you look at the history, the collections framework was added in JDK 1.2, and the JDK 1.2 specification includes both Set and SortedSet. There is the wording in the SortedSet specification that is very similar to what exists today: [edited for brevity]

    Note that the ordering maintained by a sorted set must be consistent with
    equals if the sorted set is to correctly implement the Set interface.
    This is so because the Set interface is defined in terms of the equals
    operation, but a sorted set performs all element comparisons using its
    compareTo (or compare) method. The behavior of a sorted set is well-defined
    even if its ordering is inconsistent with equals; it just fails to obey the
    general contract of the Set interface.

The original designers (including Josh Bloch, and probably others) clearly thought about these issues enough to put that wording in the specification. I don't think they thought through the full implications of this, or at least they didn't write it down, hence the vague wording above.

In addition, the issue was forgotten, and over time, bugs were introduced in the system that made things worse. For example, the "optimization" that is the root cause of the bug we are discussing (JDK-6394757) was introduced in JDK 1.3. In the comments on this bug report [1] Bloch is quoted as saying

    The spec clearly states that removeAll "Removes all this collection's
    elements that are also contained in the specified collection." So the
    current behavior is not just unpredictable, it's broken. It worked until
    1.3, and was broken in 1.3-1.5. Sigh...

    (And indeed, AbstractCollection.removeAll does precisely what the
    Collection.removeAll spec demands)

[1] https://bugs.openjdk.java.net/browse/JDK-6394757?focusedCommentId=12242803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12242803

Another bug, a spec bug this time, was introduced later. (I haven't searched the history to find out exactly when.) The TreeSet.add() method specification says this: [2]

    Adds the specified element to this set if it is not already present. More
    formally, adds the specified element e to this set if the set contains no
    element e2 such that Objects.equals(e, e2). If this set already contains the
    element, the call leaves the set unchanged and returns false.

[2] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/TreeSet.html#add(E)

This is simply wrong, as TreeSet membership is determined by the comparison method, not by equals().

What would (full) support for custom membership semantics look like? What restrictions would you place on the contains() method? What axioms would you specify for the generic operations? Is substitutability tossed out the window? What does it mean for an object to be a member of a set (according to contains) but not be returned by the iterator or included in the size? What should Set.copyOf() do if its argument is a non-conforming collection? etc. etc.

I think it's fair to raise all of these questions if "all bets are off" but indeed all bets are not off. I think the root of the problem is in the statement from the specification,

    The behavior of a sorted set is well-defined even if its ordering is
    inconsistent with equals; it just fails to obey the general contract
    of the Set interface.

This statement is essentially impossible to understand without looking at the history, various bugs (such as this one), the design intent of the collections framework, etc. What it really means is the following.

A SortedSet being "well-defined" answers most of your questions above. The obvious axioms all still apply. If you add an element and add() returns true, then contains() on that element should return true, and iterating the set should at some point return that element, etc.

The "general contract" phrase that appears in several places doesn't mean "the totality of the Set interface" but instead it means specifically what we've been calling the "membership contract" of the set. That is,

    Sets contain no pair of elements e1 and e2 such that e1 $ e2.

Where $ is:

    compare(e1, e2) == 0   for SortedSet
    ==                     for sets built from IdentityHashMap
    equals()               for Sets excluding the above

Once it's clear that "general contract" really means membership contract, many things become clearer. Most bets in fact are still on. Issues arise only when there are cases where there are two sets at play, and the question of which set's membership contract is in use needs to be specified. For copy constructors, Set.copyOf, and addAll(), it's clear that elements are added according to the receiver's membership contract. Other methods require a bit more analysis:

    containsAll()
    equals()
    removeAll()
    retainAll()

The containsAll() and equals() methods both use the membership contract of the receiver, not the argument. Unfortunately, the equals() specification says,

    Returns true if the specified object is also a set, the two sets have the
    same size, and every member of the specified set is contained in this set
    (or equivalently, every member of this set is contained in the specified
    set).

As should be clear from this discussion, the "equivalently" clause is incorrect -- another spec bug.

Finally, removeAll() and retainAll() use the membership semantics of the argument. I've discussed this elsewhere in this thread.

To summarize, fixing this bug is part of a larger effort to clean up various ambiguities and bugs in the specifications, as well as bugs in implementations. The fact that I'm proceeding incrementally shouldn't be interpreted to mean that I'm fixing semantic issues one by one as I come across them.

I understand that trade-offs are necessary, and if I understand correctly, the CSR process should review intentional regressions. Do you agree?

Yes, I will definitely file a CSR for this, to include specification changes as well as the behavioral and performance tradeoffs.

s'marks

Reply via email to