This all looks great Piotr, thank you for putting it together. I would 100%
support and help maintain this library.

I have minor comments for now:

The name XmlFactories reads oddly to me. It's a factory that produces
different kind of XML related objects, so I'd just call it XmlFactory.

I would put everything in one package and let as much as possible be
package private.

Thank you again!
Gary

On Fri, Apr 24, 2026, 03:35 Piotr P. Karwasz <[email protected]>
wrote:

> Hi all,
>
> I finally pushed an initial draft of the Commons XML Factory project I
> proposed back in December [1]:
>
> https://github.com/copernik-eu/commons-xml-factory
>
> The library is a single `XmlFactories` class with factory methods that
> return hardened JAXP factories for:
>
> - DocumentBuilderFactory
> - SAXParserFactory
> - XMLInputFactory
> - TransformerFactory
> - SchemaFactory
> - XPathFactory
>
> Internally, each factory method dispatches to a per-implementation
> `XmlProvider` that applies the correct hardening for that
> implementation. The SPI is open via `ServiceLoader`, but providers for
> the JDK, Xerces, Woodstox and Saxon are bundled.
>
> It's fair to ask whether this is worth a library at all: a per-factory
> hardening recipe is only a handful of lines, and most projects wrote
> their own years ago. Two observations:
>
> First, those handful of lines are exactly the lines people forget or get
> subtly wrong. The 2025 Java XXE CVEs bear this out: Apache Tika
> (CVE-2025-54988, CVE-2025-66516), WebDriverManager (CVE-2025-4641),
> CycloneDX (CVE-2025-64518), GeoServer (CVE-2025-58360).
>
> Second, the correct recipe depends on which JAXP implementation is
> actually on the classpath, and that's often not what the developer
> thinks. A library author tests against the JDK, observes that
> FEATURE_SECURE_PROCESSING transitively restricts ACCESS_EXTERNAL_*
> (JEP 185), and writes a minimal hardening block. The library is then
> deployed in an application that pulls in external Xerces transitively:
> JEP 185 no longer applies, ACCESS_EXTERNAL_* is not honored, and the
> minimal block is no longer sufficient.
>
> The draft intentionally offers no configuration: it hardens at one
> level and fails fast if it encounters an implementation it doesn't
> recognize. Before extending it, I'd like feedback on whether the
> proposed direction makes sense.
>
> I see three plausible hardening levels worth supporting:
>
> 1. No DOCTYPE allowed. Eliminates the entire class of DTD-based
>    attacks. This is what the draft implements.
>
> 2. DOCTYPE allowed, no external resources loaded. Internal entities
>    work (for users who need HTML-style named entities, for example),
>    entity expansion limits are enforced, but nothing is fetched from
>    outside the document.
>
> 3. DOCTYPE allowed, user-supplied resolver. The caller provides an
>    EntityResolver; we wrap it so that if the resolver returns null for
>    an unknown reference, we throw rather than falling through to the
>    parser's default URL-fetching behavior. This closes SAX's most
>    common footgun while letting integrators implement classpath-scoped
>    loading, XML catalogs, and similar.
>
> The draft also addresses the secondary-source problem for
> TransformerFactory (stylesheet loading) and SchemaFactory (schema
> imports). Currently both are locked down as tightly as primary input,
> but this is probably a place where two distinct levels make sense:
> users often have trusted stylesheets or schemas they want to load via
> xsl:import or xs:include, separate from the question of what to allow
> in the document being transformed or validated.
>
> Two things I'd particularly appreciate feedback on:
>
> - Does the three-level model above cover the use cases you'd want to
>   bring to this library?
>
> - For the secondary-source question, is there appetite for a separate
>   axis, or should primary and secondary be tied together under a
>   single level?
>
> Piotr
>
> [1] https://lists.apache.org/thread/b2tjc15vjkgsrxxkc8phlnt6801hx4xz
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to