rzo1 commented on PR #1103:
URL: https://github.com/apache/opennlp/pull/1103#issuecomment-4779676453
One design point worth addressing here (and a convention I'd like us to hold
to generally): `Confusables` loads its data eagerly in a **static initializer**:
```java
private static final Map<Integer, String> PROTOTYPES = load(); // runs at
class-load time
```
In an application container the classloader that loads this class is often
**not** the loader the app runs with (servlet containers, OSGi,
shaded/relocated jars, modular setups). When that happens,
`getResourceAsStream` can't see the bundled resource, `load()` throws, and the
failure surfaces as an `ExceptionInInitializerError` — which then **permanently
poisons the class**: every subsequent use throws `NoClassDefFoundError` with
the original cause gone. It's eager and unrecoverable, and it happens at
class-load rather than at first real use.
Suggestion: make it **lazy and recoverable** instead — a double-checked
`prototypes()` accessor that calls `load()` on first use, so a failure throws a
normal exception you can catch/retry rather than killing the class for the
lifetime of the loader.
To be clear this isn't specific to `Confusables` — it's a general rule for
**any resource/model loading**: we should never do classpath resource loading
inside a `static {}` block / static-final initializer. (The inline
`List.of(...)` static blocks in `UnicodeWhitespace`/`UnicodeDash` are fine — no
I/O, no classloader risk.) Let's keep new code lazy here, and treat existing
eager-static loaders elsewhere as something to migrate over time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]