rzo1 opened a new pull request, #1080: URL: https://github.com/apache/opennlp/pull/1080
Backport of #1020 to `opennlp-1.x`. ## Problem `DictionaryEntryPersistor.create()` built its `XMLReader` via the deprecated, insecure `XMLReaderFactory.createXMLReader()`, bypassing OpenNLP's secure XML parser configuration (XXE exposure). ## Fix - Route dictionary parsing through `XmlUtil.createSaxParser().getXMLReader()`. - Harden `XmlUtil` (both `createDocumentBuilder` and `createSaxParser`): disable external DTD/schema access and external general/parameter entities, disallow DOCTYPE declarations, disable XInclude and entity-reference expansion. `FEATURE_SECURE_PROCESSING` is attempted in a guarded block so platforms lacking it (e.g. Android) still work. Namespace awareness is set on the factory (replacing the old per-reader feature flag). ## 1.x adaptations vs. the 2.x change - `opennlp-tools` on this branch has no slf4j; the unsupported-feature warning is emitted via `System.err`, the branch's logging idiom. First of two stacked XmlUtil ports; the follow-up is OPENNLP-1835 (tolerate unsupported XML parser security options). Tests: all `*Dictionary*` serializer tests pass on Temurin JDK 8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
