Hi Emmanuel,

On 18.12.2025 15:51, Emmanuel Bourg wrote:
> Do we have Apache projects willing to use this library? Do you have a
> code snippet in mind illustrating how it would be used and the amount of
> code it would save?


I would certainly use such a library in Log4j. Today we already have
two separate classes that independently configure a
DocumentBuilderFactory ([1] and [2]). If the library is kept small and
focused, I believe the rest of the PMC would be comfortable adding it
as a non-optional dependency of `log4j-core`.

Apache Tika may also be a good candidate. A recent CVE there
(CVE-2025-54988 [4], rated 9.8 by CISA) was triggered by a missing
hardening option on XMLInputFactory, which is exactly the kind of
problem this library would aim to prevent.

More generally, the value is less about lines of code saved and more
about achieving consistent, secure behavior regardless of the JAXP
implementation present on the classpath.

For example, with the default JDK implementation on JDK 8+, the
following is already secure:

  DocumentBuilderFactory dbf =
      DocumentBuilderFactory.newInstance();

However, if an application happens to include `xercesImpl` (which is
quite common in larger applications with older transitive
dependencies), additional configuration is required:

  dbf.setFeature(
      "http://javax.xml.XMLConstants/feature/secure-processing";,
      true);
  dbf.setFeature(
      "http://xml.org/sax/features/external-general-entities";,
      false);
  dbf.setFeature(
      "http://xml.org/sax/features/external-parameter-entities";,
      false);
  dbf.setFeature(
      "http://apache.org/xml/features/nonvalidating/load-external-dtd";,
      false);

On top of that, each call may throw an exception if the feature is not
supported, leaving every project to decide how to handle partial or
inconsistent hardening.

The goal, therefore, is not just configuration convenience, but to
answer a set of security-sensitive questions once and consistently,
such as:

- Should complex XML documents with recursively defined entities be
  allowed?
- Should external documents be fetched and merged during parsing?

XML parsing is a minefield. For example, Apache Tika previously had an
XML resolver that returned an empty string; depending on the parser
implementation, this either added some protection or none at all.

> Regarding the name, there is a risk of confusing the library with the
> JAXP API. What about a more descriptive name such as commons-safe-xml or
> commons-xml-parser?


Names like `commons-xml-parser`, `commons-xml-parsers`,
or `commons-xml-utils` all sound reasonable to me.

Piotr

[1]
https://github.com/apache/logging-log4j2/blob/2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/xml/XmlConfiguration.java
[2]
https://github.com/apache/logging-log4j2/blob/2.x/log4j-1.2-api/src/main/java/org/apache/log4j/xml/XmlConfiguration.java
[3]
https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/utils/XMLReaderUtils.java
[4] https://www.cve.org/CVERecord?id=CVE-2025-54988

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to