rzo1 opened a new pull request, #1117: URL: https://github.com/apache/opennlp/pull/1117
## Problem Release/CI builds of `opennlp-docs` fail non-deterministically while reading the DocBook sources: ``` Failed to transform opennlp.xml.: Failure reading .../opennlp.xml: Remote host terminated the handshake: SSL peer shut down incorrectly ``` ## Root cause The docbkx-maven-plugin already loads an **offline** XML catalog from the bundled `net.sf.docbook:docbook-xml:5.0-all:resources` dependency (confirmed in debug: `Catalogs to load: jar:.../docbook-xml-5.0-all-resources.zip!/docbook/catalog.xml`). That catalog maps: - public id `-//OASIS//DTD DocBook XML 5.0//EN` - system ids `http://www.oasis-open.org/docbook/xml/5.0/dtd/docbook.dtd` and `http://docbook.org/xml/5.0/dtd/docbook.dtd` But the manual sources declared a DOCTYPE that matched **neither**: - public id `-//OASIS//DTD DocBook XML **V5.0**//EN` (extra `V`) - system id `https://**cdn.docbook.org**/schema/5.0/dtd/docbook.dtd` With no catalog match, the parser falls back to fetching the DTD from `cdn.docbook.org` over the network at build time. That host (a shared host, not a real CDN) intermittently resets the TLS handshake (~75% failure observed), so the build fails at random — the "SSL handshake" error is a symptom, not the cause. ## Fix Align the DOCTYPE public/system identifiers in all `opennlp-docs/src/docbkx/*.xml` files with the bundled catalog, so the DTD resolves locally. The DTD is **retained** (PDF/FO generation depends on it for correct whitespace handling). ## Verification Built `opennlp-docs package` (HTML + PDF) with **offline mode and a dead HTTP/HTTPS proxy** — any network access would fail instantly — and it is `BUILD SUCCESS`, generating both `opennlp.html` and `opennlp.pdf`. The docs build no longer performs any network access and is fully deterministic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
