[jira] [Comment Edited] (CASSANDRA-15700) Performance regression on internode messaging

Benedict Elliott Smith (Jira) Tue, 07 Apr 2020 08:26:28 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077336#comment-17077336
 ]


Benedict Elliott Smith edited comment on CASSANDRA-15700 at 4/7/20, 3:25 PM:
-----------------------------------------------------------------------------

So, I'm actually misremembering: though I had thought we scheduled exactly what 
you suggest (for which we must use something like {{earliestExpiresAt}}, and 
for which it would need to be maintained) we actually only use it to avoid 
wasted work performing a prune on enqueue when it is unnecessary to avoid 
head-of-line blocking.

Either way, it's the same basic problem (i.e. that we only maintain the minimum 
value gating prune work, and we should instead maintain some searchable 
approximation of the whole queue), and the solution is probably to maintain 
some kind of minimal histogram of the expiration times, or otherwise pick a 
number that isn't the minimum but amortises the prunes over multiple 
operations, that permits us to prune at worst a little later than we might 
while bounding the number of expired messages we leave in place.

It looks like we only schedule regular prunes when disconnected; we should 
probably fix that too.


was (Author: benedict):
So, I'm actually misremembering: though I had thought we scheduled exactly what 
you suggest (for which we must use something like {{earliestExpiresAt}}, and 
for which it would need to be maintained) we actually only use it to avoid 
wasted work performing a prune on enqueue when it is unnecessary to avoid 
head-of-line blocking.

Either way, it's the same basic problem (i.e. that we only maintain the minimum 
value gating prune work, and we should instead maintain some searchable 
approximation of the whole queue), and the solution is probably to maintain 
some kind of minimal histogram of the expiration times that permits us to prune 
at worst a little later than we might while bounding the number of expired 
messages we leave in place.

It looks like we only schedule regular prunes when disconnected; we should 
probably fix that too.

> Performance regression on internode messaging
> ---------------------------------------------
>
>                 Key: CASSANDRA-15700
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Internode
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Normal
>             Fix For: 4.0
>
>         Attachments: Oss40vsOss311.png, oss40.gc, oss40_nogc.tar.xz, 
> oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-15700) Performance regression on internode messaging

Reply via email to