On Thu, 4 Sept 2025 at 03:17 UTC, Stuart Marks <stuart.ma...@oracle.com> wrote: > > Hi Shaun, > > You got pretty far with your analysis; congratulations!
Thank you!! > You already proposed and knocked down the "obvious solution" which is to > trust JDK > classes (which are loaded by the boot classloader). Similar simple solutions, > such > as an allowlist with `instanceof` checks, can be defeated by subclassing. Those two checks `&&`ed together would work, though, I think? Here's a rough census of some ways to identify "trusted" collection classes: * ClassLoader -- not selective enough on its own * instanceof ClassName, combined with classloader check -- has the hazard of being inheritable within the JDK; won't work for private collection classes; O(n) where n is the number of classes checked * instanceof MarkerInterface, combined with classloader check -- has the hazard of being inheritable within the JDK; presence of marker interface would be visible outside the JDK via reflection, which is inelegant but mostly harmless * .getClass().getInterfaces().contains(MarkerInterface) -- non-heritable version of instanceof * A centrally-maintained table of trusted Class objects -- non-heritable; won't work for private collection classes; causes early class loading * A centrally-maintained table of trusted class *names* -- avoids premature class loading; a brittle binding because it's not checked at compile time * Marker annotation on class -- could be very tidy but performance is unclear and might have bootstrapping problems?? * Each participating class uses a static initializer to call a JDK internal method to add its Class object to a dynamically populated table of trusted classes -- extra complexity of dynamic table but I believe it could work * more??? It's something that would need checking by a security expert. Any of the table-based solutions have the potential to become quite large. If any JDK Collection is potentially allowed to join the "trusted" club, that's a huge type hierarchy. So there is the question of how to achieve O(1) lookup. I would instinctively reach for an IdentityHashMap but maybe that's not bootstrap-friendly, I don't know. Or maybe the trusted club really should just be very small and insular. Trading security for performance is short-sighted. Also, I see no options for anonymous collection classes to participate, except stop being anonymous. Anonymous classes don't have proper names, can't implement extra interfaces, can't have class annotations, can't refer to their class object in a static initializer, etc. There are very few of these, although java.util.AbstractMap.keySet() is an example. > But this would still leave non-JDK classes out in the cold. An arrangement > that > could work would be to have some construct that allows the creation of and > writing > to an array, but which precludes the possibility of retaining a reference to > the > array (or perhaps merely preventing writes to it after a certain point). One > could > imagine a library construct that implements this. Of course it would require > classes > to opt into this arrangement. Ordinary implementors of toArray() would still > suffer > an extra copy. It's remarkable, because at its core it's such a basic thing for a computer program to do: copy a list and stick it in another. The bugs that result are highly non-obvious. When the need is basic but the correct solution is subtle, it's the kind of bug that will keep happening. Not just with ArrayList, but with anything conceptually similar. It's like Java's equivalent of a buffer overflow. The language design makes it inevitable for this to go wrong. I think the simple, widely-applicable List.of and friends are very helpful here because they make it easy for applications to move data around securely and immutably, to do things the right way even if they didn't realize there was a wrong way. If there were also a solution for primitives -- frozen arrays or primitive generic lists -- that could cover some more ground. Around 2010, I wrote a terrible hack for the usual wrong reason (performance): there was a ByteArrayOutputStream with a large blob of data, and I wanted to avoid the clone cost of calling toByteArray(), so I did `baos.writeTo(new OutputStream() { ... })` with an overridden `OutputStream.write` that simply exfiltrated the passed buffer argument -- letting me access, even modify, the internal buffer of the ByteArrayOutputStream. My intent wasn't nefarious. But the general principle is this: arrays are dangerous. They're always mutable, widely shared around, and you can't stop someone retaining a reference, eventually causing problems deliberately or accidentally. I had a vision similar to what you describe: Some kind of library class that takes an array and provides a temporary, closeable view of a slice of it, so access can be shared (read-only, write-only, or read-write) inside an enforceable scope. But I think such a class would be too cumbersome to see wide use unless its views could be typecast to a "normal" array type. Then it could be used with traditional APIs like `Collection.toArray(Object[])`, or `OutputStream.write(byte[])`. But that turns a simple utility class into a serious language and VM engineering project. > But maybe this is overkill, and the best is the enemy of the good. Perhaps we > could > add a short allowlist that contains just the most frequently-used JDK > classes. This > wouldn't be exhaustive, but it could help the situation by covering more of > the > common cases. Yes, 100%. The Pareto principle applies; most of the available benefit would come from just a few cases. It would be nice if at least ArrayList and List.of could trust each other, providing a smooth transition between the mutable and immutable realms. > > Also, the knowledge of why the defensive clone is made at all, and why > > it must be of type Object[] and not a covariant subclass, these are > > currently arcane secrets held only by those who know the history of > > the associated bug reports. That's another part of why I argue for a > > central method that can be named and discussed. > The need for an extra defensive copy was established and fixed via the > security > vulnerability process and as such, all discussion of it was embargoed. Sorry. > [...] > But I think you've discerned the issue yourself already. In 2014 I sent in a bug report, JDK-8048200, the "Widget" example showing some toArray() mischief. If I recall correctly, it was closed as a duplicate of some earlier report or reports, ultimately disregarded because the exploit did not seem potent enough to justify the cost of blocking it. Years later I noticed the change to ArrayList, and discovered that my bug report and dupes had been quietly disappeared from the web. I can't even see the title. So I realized there had been some kind of blowup. :) Anyway, what I was wanting to say by complaining about the "arcane secrets" is simply it's another reason why a documented method is beneficial, to leave signposts in the code when dragons are nearby. Another 50 years from now, those classes will still be in use. If someone in the future reworking PriorityBlockingQueue sees this: if (c.getClass() != java.util.ArrayList.class) es = Arrays.copyOf(es, n, Object[].class); they won't appreciate the detailed thought behind those two lines unless they decide to do some extensive digital archeology. A commented method can refer to past bug reports, ward off future bugs, and help promote the right way of doing things in general. I've read your blog post on the subject of "The Importance of Writing Stuff Down" :) I imagine it was initially undesirable to draw attention to the code in an open-source security patch, but it's been 5 years. ~