[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842121#comment-16842121 ]
Benedict edited comment on CASSANDRA-15013 at 5/17/19 12:01 PM: ---------------------------------------------------------------- Thanks [~sumanth.pasupuleti], the patch is looking really good. Some remaining questions: * Do we need requestsProcessed metric? We already have {{regularStatementsExecuted}} and {{preparedStatementsExecuted}} which should track closely for the traffic we care about. * Conversely, do we want some metric to track back pressure being deployed? It’s not clear exactly what semantics we would want to maintain here, since we don’t _currently_ pause all channels for a given endpoint when the endpoint overflows, and it’s also unclear if we would want to track this per-client (probably not, although it would be really nice to do so) * I think it would be nice to manage {{requestPayloadInFlightPerEndpoint}} entirely inside {{EndpointPayloadTracker}} ; it's presently only accessed once outside in an adjacent class, but it would be very simple to hide the map entirely, as well as {{tryRef}}, and simply offer a {{public static get}} method in {{EndpointPayloadTracker}}. WDYT? What do you also think about starting/stopping all channels for an endpoint at once, when we cross the threshold? I don't think it is essential, but is probably worth considering, as it makes our limits even less clearly defined (given we're permitted to cross them already, once per channel; it would be nice to tighten that to once per-endpoint) was (Author: benedict): Thanks [~sumanth.pasupuleti], the patch is looking really good. Some remaining questions: * Do we need requestsProcessed metric? We already have {{regularStatementsExecuted}} and {{preparedStatementsExecuted}} which should track closely for the traffic we care about. * Conversely, do we want some metric to track back pressure being deployed? It’s not clear exactly what semantics we would want to maintain here, since we don’t _currently_ pause all channels for a given endpoint when the endpoint overflows, and it’s also unclear if we would want to track this per-client (probably not, although it would be really nice to do so) * I think it would be nice to manage {{requestPayloadInFlightPerEndpoint}} entirely inside EndpointPayloadTracker}}; it's presently only accessed once outside in an adjacent class, but it would be very simple to hide the map entirely, as well as {{tryRef}}, and simply offer a {{public static get}} method in {{EndpointPayloadTracker}}. WDYT? What do you also think about starting/stopping all channels for an endpoint at once, when we cross the threshold? I don't think it is essential, but is probably worth considering, as it makes our limits even less clearly defined (given we're permitted to cross them already, once per channel; it would be nice to tighten that to once per-endpoint) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > ------------------------------------------------------------------------------- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client > Reporter: Sumanth Pasupuleti > Assignee: Sumanth Pasupuleti > Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org