jmckenzie-dev commented on code in PR #178:
URL:
https://github.com/apache/cassandra-analytics/pull/178#discussion_r2907478233
##########
cassandra-analytics-cdc/src/main/java/org/apache/cassandra/cdc/Cdc.java:
##########
Review Comment:
> EX: if there is an out of memory exception during run method, do we stop
consumers completely? what if the memory usage goes down, how does the sidecar
start consumers again in that case?
In my experience, once you've hit an OOM you can no longer trust the java
process you're operating in to be in a good state. Any and all manner of
problems come up once you have a non-deterministic "blast" of items that failed
to allocate. Similar to my comment above about ordering and greedy resetting of
the `this.timerId` to -1 before we've actually tried to stop things and see if
they shut down gracefully or explode on execution, I think we face the same
risk here. Catching all Throwables in this way is "fine" if you have something
where you can reasonably recover from a failure state, but what do we do here
if we're in a hybrid failure state (i.e. some persisting to disk failed, some
didn't)? The logic around `persistToCassandra` will remove an activeState _even
if it's in a catastrophically failing state_ and simply re-enqueue it for the
next attempt.
This whole thing seems engineered to try and always keep the
`persistToCassandra` futures around until we know they're durably done, so we
move to just catch everything and absorb them on failure. I guess my concern is
the degenerate degradation case where we end up with a half-dead zombie process
because we over-allocated and then end up strewing OOM's everywhere. For
instance - if we're mid OOM, the following code's not going to re-enqueue the
calls to persist:
```
states.stream()
.map(this::persistToCassandra)
.filter(Objects::nonNull)
.forEach(activeFlush::add);
```
The design here of the handoff between `activeFlush` and `states` plus our
greedy "catch everything" exposes us to some nasty problems. =/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]