On Thu, Dec 18, 2025, 10:40 Piotr P. Karwasz <[email protected]> wrote:
> Hi Emmanuel, > > On 18.12.2025 15:51, Emmanuel Bourg wrote: > > Do we have Apache projects willing to use this library? Do you have a > > code snippet in mind illustrating how it would be used and the amount of > > code it would save? > > > I would certainly use such a library in Log4j. Today we already have > two separate classes that independently configure a > DocumentBuilderFactory ([1] and [2]). If the library is kept small and > focused, I believe the rest of the PMC would be comfortable adding it > as a non-optional dependency of `log4j-core`. > > Apache Tika may also be a good candidate. A recent CVE there > (CVE-2025-54988 [4], rated 9.8 by CISA) was triggered by a missing > hardening option on XMLInputFactory, which is exactly the kind of > problem this library would aim to prevent. > > More generally, the value is less about lines of code saved and more > about achieving consistent, secure behavior regardless of the JAXP > implementation present on the classpath. > > For example, with the default JDK implementation on JDK 8+, the > following is already secure: > > DocumentBuilderFactory dbf = > DocumentBuilderFactory.newInstance(); > > However, if an application happens to include `xercesImpl` (which is > quite common in larger applications with older transitive > dependencies), additional configuration is required: > > dbf.setFeature( > "http://javax.xml.XMLConstants/feature/secure-processing", > true); > dbf.setFeature( > "http://xml.org/sax/features/external-general-entities", > false); > dbf.setFeature( > "http://xml.org/sax/features/external-parameter-entities", > false); > dbf.setFeature( > "http://apache.org/xml/features/nonvalidating/load-external-dtd", > false); > > On top of that, each call may throw an exception if the feature is not > supported, leaving every project to decide how to handle partial or > inconsistent hardening. > > The goal, therefore, is not just configuration convenience, but to > answer a set of security-sensitive questions once and consistently, > such as: > > - Should complex XML documents with recursively defined entities be > allowed? > - Should external documents be fetched and merged during parsing? > > XML parsing is a minefield. For example, Apache Tika previously had an > XML resolver that returned an empty string; depending on the parser > implementation, this either added some protection or none at all. > > > Regarding the name, there is a risk of confusing the library with the > > JAXP API. What about a more descriptive name such as commons-safe-xml or > > commons-xml-parser? > > > Names like `commons-xml-parser`, `commons-xml-parsers`, > or `commons-xml-utils` all sound reasonable to me. > Utils is never a good name. It's the kind of kitchen sink name that leads to anything and everything getting dumped into it. Gary > Piotr > > [1] > > https://github.com/apache/logging-log4j2/blob/2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/xml/XmlConfiguration.java > [2] > > https://github.com/apache/logging-log4j2/blob/2.x/log4j-1.2-api/src/main/java/org/apache/log4j/xml/XmlConfiguration.java > [3] > > https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/utils/XMLReaderUtils.java > [4] https://www.cve.org/CVERecord?id=CVE-2025-54988 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
