> From: "Glavo" <zjx001...@gmail.com> > To: "Stuart Marks" <stuart.ma...@oracle.com> > Cc: "John Hendrikx" <john.hendr...@gmail.com>, "core-libs-dev" > <core-libs-dev@openjdk.org> > Sent: Tuesday, January 31, 2023 10:12:24 PM > Subject: Re: NPE throwing behavior of immutable collections
> I understand that null is prohibited by default, but can we also provide a set > of factory methods that accept null? > They can be named like List.ofNullable/copyOfNullable. It is actually spelt, Arrays.asList() and collection.stream().toList(). Rémi > On Tue, Jan 31, 2023 at 10:13 AM Stuart Marks < [ > mailto:stuart.ma...@oracle.com > | stuart.ma...@oracle.com ] > wrote: >> In this reply I'll focus on the null handling issues in collections. >> As you've noted, there are really (at least) two distinct issues here: >> whether a >> collection permits nulls, and the behavior of contains(null) queries. There >> have >> been continual complaints about both, and the issues are somewhat distinct. >> The collection implementations in the JDK are all over the place with regard >> to >> nulls. I believe this is because the original collections and the concurrent >> collections up through JDK 1.6 or so were mostly done by Josh Bloch and Doug >> Lea, >> who disagreed about null handling. They had this exchange in 2006: >> Doug Lea wrote: [1] >> > The main reason that nulls aren't allowed in ConcurrentMaps >> > (ConcurrentHashMaps, ConcurrentSkipListMaps) is that >> > ambiguities that may be just barely tolerable in non-concurrent >> > maps can't be accommodated. The main one is that if >> > map.get(key) returns null, you can't detect whether the >> > key explicitly maps to null vs the key isn't mapped. >> > In a non-concurrent map, you can check this via map.contains(key), >> > but in a concurrent one, the map might have changed between calls. >> > Further digressing: I personally think that allowing >> > nulls in Maps (also Sets) is an open invitation for programs >> > to contain errors that remain undetected until >> > they break at just the wrong time. (Whether to allow nulls even >> > in non-concurrent Maps/Sets is one of the few design issues surrounding >> > Collections that Josh Bloch and I have long disagreed about.) >> Josh Bloch wrote: [2] >> > I have moved towards your position over the years. It was probably a >> > mistake to allow null keys in Maps and null elements in Sets. I'm >> > still not sure about Map values and List elements. >> > In other words, Doug hates null more than I do, but over the years >> > I've come to see it as quite troublesome. >> (I haven't talked to Josh or Doug about this particular issue recently, so I >> suppose >> it's possible their opinions have changed in the intervening time.) >> I had to decide what to do about nulls when I added the unmodifiable >> collections >> in >> JDK 9. Given that Doug "hates" nulls, and Josh finds them at least quite >> troublesome, I started from a position of disallowing null members in all the >> new >> collections. Most of the unmodifiable collections are still this way (but see >> below). >> What about contains(null)? Surely it should be ok to query for a null, and >> return >> false, even if the collection itself can't contain null. However, a bunch of >> collections and collection views in the JDK throw NPE on contains(null), so >> the >> unmodifiable collections don't set a precedent here. >> If you're dealing with all non-null values, and a null creeps in somehow, >> then >> calling contains(null) might be an error, and it would be good for the >> library >> to >> inform you of that instead of just returning false and letting the program >> continue >> until it fails later (fail-fast). A similar issue arises with the non-generic >> contains() method. If you have a Collection<Integer>, then shouldn't >> contains("abc") >> be an error? It's not, and it simply returns false, because the signature is >> contains(Object). But there have been complaints that this should have been >> an >> error >> because a Collection of Integer cannot possibly contain a String. Similar >> reasoning >> applies to contains(null) on a collection that can't contain null. (Yes, the >> error >> occurs at runtime instead of compile time, but better late than never.) >> A meta-issue is: should a new thing correct a perceived design flaw in the >> older >> thing, or should it be consistent with the old thing? Either way, you lose. >> Another factor in my original decision was that it's much easier to start >> with a >> restriction and relax it later than it is to go the other way. We have some >> flexibility to allow nulls and contains(null) if we want; but it's >> essentially a >> non-starter to disallow nulls once they've been allowed somewhere. >> That's why my starting position for the design prohibited nulls everywhere. >> Over time we've relaxed some restrictions around null handling in the >> unmodifiable >> collections. Streams permit nulls, and the List returned from Stream.toList() >> also >> permits nulls, and it also allows contains(null). So there's a different >> unmodifiable List implementation under there, accessible only via streams. >> (Note >> that the null-permitting unmodifiable lists lose the optimizations for sizes >> 0, >> 1, >> and 2, which are in part based on nulls being disallowed.) >> I'm fairly sympathetic to the case for changing the unmodifiable collections >> to >> allow contains(null), given that the current NPE-throwing behavior has >> generated >> a >> fair amount of complaints over the years. And the situation with having two >> flavors >> of unmodifiable list is quite odd and is an uncomfortable point in the design >> space. >> I should point out however that even if the unmodifiable collections were >> changed to >> allow contains(null), it's still kind of unclear as to whether this code is >> safe: >> public void addShoppingItems(Collection<String> shoppingItems) { >> if (shoppingItems.contains(null)) { // this looks quite reasonable and >> safe... >> throw new IllegalArgumentException("shoppingItems should not contain nulls"); >> } >> ... >> } >> If you're only ever passing ArrayList, Arrays.asList(), and List.of() lists >> here, >> then sure, this would work great. If the source collection is of unknown >> provenance, >> you can still run into trouble: >> 1. contains(null) calls on the keySet and values collections of >> ConcurrentHashMap >> and Hashtable all throw NPE. So does contains(null) on a set from >> Collections.newSetFromMap(), which is built from those maps' keySets. (I >> suppose >> Doug Lea might be convinced to relax the CHM behavior.) >> 2. Speaking of concurrency, if the source collection is being modified >> concurrently, >> then a null could be introduced after the check (TOCTOU) so you'd need to >> make a >> defensive copy before checking it. >> 3. NavigableSet and friends delegate contains() to the comparison method, >> which >> might or might not accept nulls. There's no way the collection implementation >> can >> tell in advance. >> You might consider List.copyOf() as an alternative, as that makes a null-free >> defensive copy in one shot, and avoids copying if the source is an >> unmodifiable >> list. If you don't care about concurrency, then >> shoppingItems.forEach(Objects::requireNonNull) >> might be suitable. >> If unmodifiable collections were to change to allow contains(null), this >> probably >> would make them somewhat less annoying, but it wouldn't necessarily solve >> yours >> or >> everyone's problems. There's simply no way at this point to make the >> collections' >> treatment of nulls uniform. >> s'marks >> [1] >> [ >> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html >> | >> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html >> ] >> [2] >> [ >> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html >> | >> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html >> ] >> On 1/29/23 7:28 AM, John Hendrikx wrote: >>> TLDR; why does contains(null) not just return false for collections that >>> don't >> > allow >> > nulls. Just because the interface allows it, does not mean it should do it >> > as it >> > devalues the usefulness of the abstraction provided by the interface. >> > Background: >>> I'm someone that likes to check correctness of any constructor or method >> > parameter >>> before allowing an object to be constructed or to be modified, in order to >> > maintain >> > invariants that are provided by the class to its users. >>> This ranges from simple null checks, to range checks on numeric values, to >> > complete >>> checks like "is a collection sorted" or is a list of nodes acyclic. >>> Anything I >> > can >>> check in the constructor that may avoid problems further down the line or >>> that >> > may >>> affect what I can guarantee on its own API methods. For example, if I check >>> in >> > the >>> constructor that something is not null, then the associated getter will >> > guarantee >>> that the returned value is not null. If I check that a List doesn't contain >> > nulls, >>> then the associated getter will reflect that as well (assuming it is >>> immutable >> > or >> > defensivily copied). >>> For collections, this is currently becoming harder and harder because more >>> and >> > more >>> new collections are written to be null hostile. It is fine if a collection >> > doesn't >> > accept nulls, but throwing NPE when I ask such a collection if it contains >> > null >> > seems to be counter productive, and reduces the usefulness of the >> > collection >> > interfaces. >> > This strict interpretation makes the collection interfaces harder to use >> > than >>> necessary. Interfaces are only useful when their contract is well defined. >>> The >> > more >>> things an interface allows or leaves unspecified, the less useful it is for >>> its >> > users. >>> I know that the collection interfaces allow this, but you have to ask >>> yourself >> > this >>> important question: how useful is an interface that makes almost no >>> guarantees >> > about >> > what its methods will do? Interfaces like `List`, `Map` and `Set` are >> > passed as >> > method parameters a lot, and to make these useful, implementations of these >> > interfaces should do their best to provide as consistent an experience as >> > is >>> reasonably possible. The alternative is that these interfaces will slowly >> > decline in >>> usefulness as methods will start asking for `ArrayList` instead of `List` to >> > avoid >> > having to deal with a too lenient specification. >>> With the collection interfaces I get the impression that recently there has >>> been >> > too >>> much focus on what would be easy for the collection implementation instead >>> of >> > what >>> would be easy for the users of said interfaces. In my opinion, the concerns >>> of >> > the >>> user of interfaces almost always far outweigh the concerns of the >>> implementors >> > of >> > said interfaces. >>> In the past, immutable collections were rare, but they get are getting more >>> and >> > more >>> common all the time. For example, in unit testing. Unfortunately, these >> > immutable >>> collections differ quite a bit from their mutable counterparts. Some parts >>> are >> > only >>> midly annoying (not accepting `null` as the **value** in a `Map` for >>> example), >> > but >> > other parts require code to be rewritten for it to be able to work as a >> > drop-in >> > replacement for the mutable variant. A simple example is this: >> > public void addShoppingItems(Collection<String> shoppingItems) { >> > if (shoppingItems.contains(null)) { // this looks quite reasonable and >> > safe... >> > throw new IllegalArgumentException("shoppingItems should not contain >> > nulls"); >> > } >> > this.shoppingItems.addAll(shoppingItems); >> > } >> > Testing this code is annoying: >> > x.addShoppingItems(List.of("a", null")); // can't construct immutable >> > collection with null >> > x.addShoppingItems(Arrays.asList("a", null")); // fine, go back to what we >> > did before then... >>> The above problems, I suppose we can live with it; immutable collections >>> don't >> > want >> > `null` although I don't see any reason to not allow it as I can write a >> > similar >>> `List.of` that returns immutable collections that do allow `null`. For JDK >>> code >> > this >>> is a bit disconcerting, as it is supposed to be as flexible as is reasonable >> > without >> > having too much of an opinion about what is good or bad. >> > This next one however: >> > assertNoExceptionThrown(() -> x.addShoppingItems(List.of("a", "b"))); // >> > not >> > allowed, contains(null) in application code throws NPE >>> This is much more insidious; the `contains(null)` check has been a very >> > practical >>> way to check if collections contain null, and this works for almost all >> > collections >> > in common use, so there is no problem. But now, more and more collections >> > are >>> starting to just throw NPE immediately even just to **check** if a null >>> element >> > is >> > present. This only serves to distinguish themselves loudly from other >> > similar >> > collections that will simply return `false`. >>> I think this behavior is detrimental to the java collection interfaces in >> > general, >> > especially since there is a clear answer to the question if such a >> > collection >>> contains null or not. In fact, why `contains` was ever allowed to throw an >> > exception >>> aside from "UnsupportedOperationException" is a mystery to me, and throwing >>> one >> > when >>> one could just as easily return the correct and expected answer seems very >> > opiniated >> > and almost malicious -- not behavior I'd expect from JDK core libraries. >>> Also note that there is no way to know in advance if `contains(null)` is >>> going >> > to be >>> throwing the NPE. If interfaces had a method `allowsNulls()` that would >>> already >> > be >> > an improvement. >> > --John