jnioche opened a new pull request, #8416:
URL: https://github.com/apache/storm/pull/8416
## Problem
In `LocalFsBlobStore.prepare()`, the call to
`ClusterUtils.mkStormClusterState()` is wrapped in a try/catch that silently
swallows any exception via `e.printStackTrace()`:
```java
try {
this.stormClusterState = ClusterUtils.mkStormClusterState(conf, new
ClusterStateContext(DaemonType.NIMBUS, conf));
} catch (Exception e) {
e.printStackTrace(); // stormClusterState remains null
}
```
If this call fails — for example because ZooKeeper is unreachable at startup
— `stormClusterState` stays `null` and `prepare()` returns successfully. The
failure is only visible as a stack trace on stderr, which is easily lost in
production log aggregation pipelines.
This creates a time-bomb: the blob store appears to initialise correctly,
but every subsequent operation that touches the cluster state will throw a
`NullPointerException` with no indication of the root cause:
- `startSyncBlobs()` — `this.stormClusterState.blobstore(...)`
- `setupBlobstore()` — `state.activeKeys()` (where `state =
stormClusterState`)
- `blobSync()` — `state.blobstore(...)`
The NPE stack trace points into blob store internals, making it very hard to
diagnose that the real problem was a ZooKeeper connection failure that happened
earlier during `prepare()`.
## Fix
Rethrow the exception as a `RuntimeException` so that `prepare()` itself
fails with a clear message and the original cause preserved:
```java
try {
this.stormClusterState = ClusterUtils.mkStormClusterState(conf, new
ClusterStateContext(DaemonType.NIMBUS, conf));
} catch (Exception e) {
throw new RuntimeException("Failed to initialize cluster state for
LocalFsBlobStore", e);
}
```
This matches the pattern already used a few lines above in the same method
for `FileBlobStoreImpl` initialisation, making the two failure modes consistent:
```java
try {
fbs = new FileBlobStoreImpl(baseDir, conf);
} catch (IOException e) {
throw new RuntimeException(e); // existing pattern
}
```
Nimbus now fails fast at startup with a meaningful error rather than
entering a degraded state where blob operations crash with confusing NPEs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]