Hello! Please consider the following real utility method I have:
/** * Returns an immutable list via {@link List#of} that is the * sorted content of the collection. */ public static <E> @NonNull List<E> sortedImmutableList( @NonNull Collection<E> col, @Nullable Comparator<? super E> comparator) { ArrayList<E> list = new ArrayList<>(col); list.sort(comparator); return List.copyOf(list); } Quiz question: Assume this method is invoked with a large HashSet as its collection argument. How many ARRAY copies of the collection are created by this method? Clearly there must be at least one for the result, held internally in the immutable list. Any others? (Scroll for the answer.) . . . . . . . . . . . . . . . . . . . I was surprised to realize the answer is four: 1. ArrayList calls toArray() on the HashSet. 2. ArrayList defensively copies the array to assign to its internal elementData field, because it doesn't trust HashSet. 3. List.copyOf calls toArray() on the ArrayList. 4. List.of defensively copies the array to assign to its internal elements field, because it doesn't trust ArrayList. It is quite possible to rewrite my particular utility method with a raw array instead of ArrayList, to eliminate 2 array copies, but this requires an icky unchecked cast, and by rights is the kind of thing that shouldn't be necessary anyway. ----- In general, this kind of excessive defensive copying is going on ALL THE TIME. The underlying nuisance is that a subclass of Collection could potentially override toArray() to keep a reference to the returned array and make mischief with it. (Also, for ArrayList, it would be a problem if the returned array were a subclass of Object[].) I did a search thru the JDK and found a slew of cases where code calls Collection.toArray() and then has to worry about defensively copying it -- or trying to avoid defensively copying it: - java.util.List.copyOf(Collection) -- avoids copying an existing List.of/copyOf list, but for any other collection it calls toArray() and then defensively copies it again - java.util.ArrayList(Collection) -- constructor has a special check `if (collection.getClass() == ArrayList.class)`, in which case it skips the defensive copy and trusts the result of toArray(), but it does not trust any other collection - java.util.Vector(Collection) -- constructor has the same optimization trick as ArrayList, where it trusts ArrayList, but creates a defensive copy for any other collection. Vector doesn't even trust itself! - java.util.PriorityQueue.initElementsFromCollection(Collection) -- has optimization for ArrayList, but no other collection - java.util.concurrent.PriorityBlockingQueue(Collection) -- has optimization for ArrayList, but no other collection - java.util.concurrent.CopyOnWriteArrayList(Collection) -- has optimizations for ArrayList and CopyOnWriteArrayList, but no other collection. - java.util.concurrent.CopyOnWriteArrayList.addAllAbsent(Collection) -- has optimization for ArrayList but no other collection. - java.util.stream.Collectors.toUnmodifiableList() -- this avoids an excess copy, but does so in a very complicated way - sun.awt.util.IdentityArrayList(Collection) -- this appears to be an old copy-paste of ArrayList except with identity indexOf; like pre-2020 ArrayList, it blindly trusts that `col.toArray()` returns a new array, although based on this class's extremely limited use there is no manifestable bug. - java.lang.invoke.MethodHandles.dropArguments/dropArgumentsToMatch -- these call `list.toArray()` and then clone the array unconditionally - java.time.zone.ZoneRules.of(...) -- calls `lastRules.toArray()` and then clones the array unconditionally ----- I would suggest that there should be an internal utility method for use in all these cases, something like: public static Object[] safeToArray(Collection<?> c) { Object[] a = c.toArray(); if (!isTrusted(c)) { a = Arrays.copyOf(a, a.length, Object[].class); } return a; } The delicate issue then is: what is the correct design of "isTrusted"? The check must be fast, and not spoofable by classes outside the JDK. My first thought was to detect that the class is loaded by the bootstrap class loader: static boolean isTrusted(Collection<?> c) { return c.getClass().getClassLoader() == null; } However, I now realize this is unwise because it would shift the burden of trust into **every** Collection class in the JDK, instead of into the classes that actually care about the result. For example, Collections.UnmodifiableCollection.toArray() simply returns the result of its internal `col.toArray()`, which immediately carves a hole the size of a bus thru the check. Perhaps better would be for classes to explicitly opt-in and say "Yes, I promise that my toArray() method is safe". Would a package-private marker interface work for this? I am not sure. Anyway, the point is that classes within the JDK should not need to create so many defensive copies of each other. Note that this is becoming more of an issue now that List.of is becoming a popular "go-to" list for many situations where ArrayList would traditionally have been used. The "only trust ArrayList" pattern was never ideal and is starting to age. This is my first post here so I apologize for oversights in formatting or etiquette. Thanks for reading.