On Thu, Dec 18, 2025, 10:40 Piotr P. Karwasz <[email protected]>
wrote:

> Hi Emmanuel,
>
> On 18.12.2025 15:51, Emmanuel Bourg wrote:
> > Do we have Apache projects willing to use this library? Do you have a
> > code snippet in mind illustrating how it would be used and the amount of
> > code it would save?
>
>
> I would certainly use such a library in Log4j. Today we already have
> two separate classes that independently configure a
> DocumentBuilderFactory ([1] and [2]). If the library is kept small and
> focused, I believe the rest of the PMC would be comfortable adding it
> as a non-optional dependency of `log4j-core`.
>
> Apache Tika may also be a good candidate. A recent CVE there
> (CVE-2025-54988 [4], rated 9.8 by CISA) was triggered by a missing
> hardening option on XMLInputFactory, which is exactly the kind of
> problem this library would aim to prevent.
>
> More generally, the value is less about lines of code saved and more
> about achieving consistent, secure behavior regardless of the JAXP
> implementation present on the classpath.
>
> For example, with the default JDK implementation on JDK 8+, the
> following is already secure:
>
>   DocumentBuilderFactory dbf =
>       DocumentBuilderFactory.newInstance();
>
> However, if an application happens to include `xercesImpl` (which is
> quite common in larger applications with older transitive
> dependencies), additional configuration is required:
>
>   dbf.setFeature(
>       "http://javax.xml.XMLConstants/feature/secure-processing";,
>       true);
>   dbf.setFeature(
>       "http://xml.org/sax/features/external-general-entities";,
>       false);
>   dbf.setFeature(
>       "http://xml.org/sax/features/external-parameter-entities";,
>       false);
>   dbf.setFeature(
>       "http://apache.org/xml/features/nonvalidating/load-external-dtd";,
>       false);
>
> On top of that, each call may throw an exception if the feature is not
> supported, leaving every project to decide how to handle partial or
> inconsistent hardening.
>
> The goal, therefore, is not just configuration convenience, but to
> answer a set of security-sensitive questions once and consistently,
> such as:
>
> - Should complex XML documents with recursively defined entities be
>   allowed?
> - Should external documents be fetched and merged during parsing?
>
> XML parsing is a minefield. For example, Apache Tika previously had an
> XML resolver that returned an empty string; depending on the parser
> implementation, this either added some protection or none at all.
>
> > Regarding the name, there is a risk of confusing the library with the
> > JAXP API. What about a more descriptive name such as commons-safe-xml or
> > commons-xml-parser?
>
>
> Names like `commons-xml-parser`, `commons-xml-parsers`,
> or `commons-xml-utils` all sound reasonable to me.
>

Utils is never a good name. It's the kind of kitchen sink name that leads
to anything and everything getting dumped into it.

Gary


> Piotr
>
> [1]
>
> https://github.com/apache/logging-log4j2/blob/2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/xml/XmlConfiguration.java
> [2]
>
> https://github.com/apache/logging-log4j2/blob/2.x/log4j-1.2-api/src/main/java/org/apache/log4j/xml/XmlConfiguration.java
> [3]
>
> https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/utils/XMLReaderUtils.java
> [4] https://www.cve.org/CVERecord?id=CVE-2025-54988
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to