Random thought. "Any Java library that parses XML has to harden JAXP
before handing a factory to user code, and every library ends up
copy-pasting the same hardening snippet. The snippet is fragile: the
attributes and features needed to harden a factory are not
standardised, each JAXP implementation exposes a slightly different
set, and setting an unknown one throws an exception that callers
routinely swallow."

so why not instead of trying to figure which parser you're configuring
and using a provider based architecture just set every property on
everything and swallow the exceptions that aren't supported? Are there
any cases that actively conflict?


On Fri, Apr 24, 2026 at 7:35 AM Piotr P. Karwasz
<[email protected]> wrote:
>
> Hi all,
>
> I finally pushed an initial draft of the Commons XML Factory project I
> proposed back in December [1]:
>
> https://github.com/copernik-eu/commons-xml-factory
>
> The library is a single `XmlFactories` class with factory methods that
> return hardened JAXP factories for:
>
> - DocumentBuilderFactory
> - SAXParserFactory
> - XMLInputFactory
> - TransformerFactory
> - SchemaFactory
> - XPathFactory
>
> Internally, each factory method dispatches to a per-implementation
> `XmlProvider` that applies the correct hardening for that
> implementation. The SPI is open via `ServiceLoader`, but providers for
> the JDK, Xerces, Woodstox and Saxon are bundled.
>
> It's fair to ask whether this is worth a library at all: a per-factory
> hardening recipe is only a handful of lines, and most projects wrote
> their own years ago. Two observations:
>
> First, those handful of lines are exactly the lines people forget or get
> subtly wrong. The 2025 Java XXE CVEs bear this out: Apache Tika
> (CVE-2025-54988, CVE-2025-66516), WebDriverManager (CVE-2025-4641),
> CycloneDX (CVE-2025-64518), GeoServer (CVE-2025-58360).
>
> Second, the correct recipe depends on which JAXP implementation is
> actually on the classpath, and that's often not what the developer
> thinks. A library author tests against the JDK, observes that
> FEATURE_SECURE_PROCESSING transitively restricts ACCESS_EXTERNAL_*
> (JEP 185), and writes a minimal hardening block. The library is then
> deployed in an application that pulls in external Xerces transitively:
> JEP 185 no longer applies, ACCESS_EXTERNAL_* is not honored, and the
> minimal block is no longer sufficient.
>
> The draft intentionally offers no configuration: it hardens at one
> level and fails fast if it encounters an implementation it doesn't
> recognize. Before extending it, I'd like feedback on whether the
> proposed direction makes sense.
>
> I see three plausible hardening levels worth supporting:
>
> 1. No DOCTYPE allowed. Eliminates the entire class of DTD-based
>    attacks. This is what the draft implements.
>
> 2. DOCTYPE allowed, no external resources loaded. Internal entities
>    work (for users who need HTML-style named entities, for example),
>    entity expansion limits are enforced, but nothing is fetched from
>    outside the document.
>
> 3. DOCTYPE allowed, user-supplied resolver. The caller provides an
>    EntityResolver; we wrap it so that if the resolver returns null for
>    an unknown reference, we throw rather than falling through to the
>    parser's default URL-fetching behavior. This closes SAX's most
>    common footgun while letting integrators implement classpath-scoped
>    loading, XML catalogs, and similar.
>
> The draft also addresses the secondary-source problem for
> TransformerFactory (stylesheet loading) and SchemaFactory (schema
> imports). Currently both are locked down as tightly as primary input,
> but this is probably a place where two distinct levels make sense:
> users often have trusted stylesheets or schemas they want to load via
> xsl:import or xs:include, separate from the question of what to allow
> in the document being transformed or validated.
>
> Two things I'd particularly appreciate feedback on:
>
> - Does the three-level model above cover the use cases you'd want to
>   bring to this library?
>
> - For the secondary-source question, is there appetite for a separate
>   axis, or should primary and secondary be tied together under a
>   single level?
>
> Piotr
>
> [1] https://lists.apache.org/thread/b2tjc15vjkgsrxxkc8phlnt6801hx4xz
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>


-- 
Elliotte Rusty Harold
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to