dpol1 opened a new pull request, #2997:
URL: https://github.com/apache/hugegraph/pull/2997
## Purpose of the PR
closes #2740
HugeGraphServer stops responding after Cassandra is restarted and never
recovers without a full server restart.
Root cause: `CassandraSessionPool` builds the Datastax `Cluster` without a
`ReconnectionPolicy`, `CassandraSession.execute(...)` calls the driver once
with no retry, and thread-local sessions are never probed for liveness.
Once Cassandra goes down, transient `NoHostAvailableException` /
`OperationTimedOutException` errors surface to the user and the pool stays
dead even after Cassandra comes back online.
## Main Changes
- Register `ExponentialReconnectionPolicy(baseDelay, maxDelay)` on the
`Cluster` builder so the Datastax driver keeps retrying downed nodes in
the background.
- Wrap every `Session.execute(...)` in `executeWithRetry(Statement)` with
exponential backoff on transient connectivity failures.
- Implement `reconnectIfNeeded()` / `reset()` on `CassandraSession` so the
pool reopens closed sessions and issues a lightweight health-check
(`SELECT now() FROM system.local`) before subsequent queries.
- Add four tunables in `CassandraOptions` (defaults preserve previous
behavior for healthy clusters):
| Option | Default | Meaning |
|--------|---------|---------|
| `cassandra.reconnect_base_delay` | `1000` ms | Initial backoff for
driver reconnection policy |
| `cassandra.reconnect_max_delay` | `60000` ms | Cap for reconnection
backoff |
| `cassandra.reconnect_max_retries` | `10` | Per-query retries on
transient errors (`0` disables) |
| `cassandra.reconnect_interval` | `5000` ms | Base interval for
per-query exponential backoff |
- Add unit tests covering defaults, overrides, disabling retries and option
keys.
## Verifying these changes
- [x] Need tests and can be verified as follows:
- `mvn -pl hugegraph-server/hugegraph-test -am test -Dtest=CassandraTest`
— 13/13 pass
## Does this PR potentially affect the following parts?
- [x] Modify configurations
## Documentation Status
- [x] `Doc - TODO`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]