[ 
https://issues.apache.org/jira/browse/CASSANDRA-20571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949894#comment-17949894
 ] 

Jai Bheemsen Rao Dhanwada edited comment on CASSANDRA-20571 at 5/7/25 4:52 AM:
-------------------------------------------------------------------------------

Please ignore the comment above:  this 
[commit|https://github.com/apache/cassandra/commit/5be1038c5d38af32d3cbb0545d867f21304f3a46]
 is probably exposing the problem and not introducing  it because not having 
this commit is making the overall streaming slower and causing the CPU 
consumption to be low. This can be clearly seen in the time it takes for the 
bootstrap (26 minutes vs 3 minutes). I have done multiple tests with different 
scenarios and reviewed the streaming code and here is what I see.
 * In 4.1 If the property {{streaming.session.parallelTransfers}} is not set 
the number of streaming sessions opens up max parallel thread.

{code:java}
Ref: org/apache/cassandra/streaming/async/StreamingMultiplexedChannel.java 
private static final int DEFAULT_MAX_PARALLEL_TRANSFERS = 
getAvailableProcessors();
private static final int MAX_PARALLEL_TRANSFERS = 
parseInt(getProperty(PROPERTY_PREFIX + "streaming.session.parallelTransfers", 
Integer.toString(DEFAULT_MAX_PARALLEL_TRANSFERS)));
<<truncated output>>
 fileTransferExecutor = executorFactory()
            .configurePooled("NettyStreaming-Outbound-" + name, 
MAX_PARALLEL_TRANSFERS)
            .withKeepAlive(1L, SECONDS).build();
} 
Ref: org/apache/cassandra/utils/FBUtilities.java
public static int getAvailableProcessors()
{
    if (availableProcessors > 0)
        return availableProcessors;
    else
        return Runtime.getRuntime().availableProcessors();
}


{code}
 * However, in 4.0 if the property {{streaming.session.parallelTransfers}} is 
not set the number of streaming sessions depends on the ThreadpoolExectuor 
which by default set to 1 and then allow the pool to grow. 

{code:java}
Ref: org/apache/cassandra/streaming/async/NettyStreamingMessageSender.java
fileTransferExecutor = new DebuggableThreadPoolExecutor(1, 
MAX_PARALLEL_TRANSFERS, 1L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(),
                                                        new 
NamedThreadFactory("NettyStreaming-Outbound-" + name));

Ref: ThreadPoolExecutor.class
/**
 * Creates a new {@code ThreadPoolExecutor} with the given initial
 * parameters and default rejected execution handler.
 *
 * @param corePoolSize the number of threads to keep in the pool, even
 *        if they are idle, unless {@code allowCoreThreadTimeOut} is set
 * @param maximumPoolSize the maximum number of threads to allow in the
 *        pool
 * @param keepAliveTime when the number of threads is greater than
 *        the core, this is the maximum time that excess idle threads
 *        will wait for new tasks before terminating.
 * @param unit the time unit for the {@code keepAliveTime} argument
 * @param workQueue the queue to use for holding tasks before they are
 *        executed.  This queue will hold only the {@code Runnable}
 *        tasks submitted by the {@code execute} method.
 * @param threadFactory the factory to use when the executor
 *        creates a new thread
 * @throws IllegalArgumentException if one of the following holds:<br>
 *         {@code corePoolSize < 0}<br>
 *         {@code keepAliveTime < 0}<br>
 *         {@code maximumPoolSize <= 0}<br>
 *         {@code maximumPoolSize < corePoolSize}
 * @throws NullPointerException if {@code workQueue}
 *         or {@code threadFactory} is null
 */
public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory) {
    this(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue,
         threadFactory, defaultHandler);
} 


{code}
 * This is evident from the DEBUG logs where in 4.1 we see the threads are 
ranging from \{{NettyStreaming-Outbound-/x.x.x.x.7000:1}} to 
{{NettyStreaming-Outbound-/x.x.x.x.7000:16}} during the streaming, in the 4.0 
version they are always set to 1. \{{NettyStreaming-Outbound-/x.x.x.x.7000:1}}

{code:java}
DEBUG [NettyStreaming-Outbound-/x.x.x.x.7000:11] 2025-01-31 05:36:32,393 
CassandraCompressedStreamWriter.java:62 - [Stream 
#0d829ec0-df95-11ef-9b5a-81b717c52836] Start streaming file 
/var/lib/cassandra/data/keyspace1/table1-212345705bc011e895a6cbf409872089/nb-16524-big-Data.db
 to /x.x.x.x:7000, repairedAt = 0, totalSize = 102241274 {code}
 * [This is the 
commit|https://github.com/apache/cassandra/commit/be1f050bc8c0cd695a42952e3fc84625ad48d83a#diff-1075097b8848b3055c2e17cdff45207c7889df8ec1eeae360a88f33a9168bc3f]
 which introduced this change the task execution in CASSANDRA-16925.
 * To prove this I did below two experiments and in both the cases the CPU 
spike is not observed similar to what I see earlier.
 ** set {{-Dcassandra.streaming.session.parallelTransfers=1}}
 ** compile the C* code with {{fileTransferExecutor.setCorePoolSize(1);}}
 * IIUC setting the {{-Dcassandra.streaming.session.parallelTransfers=1}} will 
always limit it to 1 and never let it grow, is that a correct way to handle 
this? as in 4.0.12 max is set number of cores, so the pool could grow.
 * Also I have set the property {{streaming_connections_per_host: 1}} which (I 
could be wrong but) looks like is not being honored in 4.1.X. I couldn't 
conclude this but just suspecting because 4.1 is creating 16 streaming 
sessions. 
 * I did another experiment to by adding 
{{fileTransferExecutor.setCorePoolSize(MAX_PARALLEL_TRANSFERS);}} in 4.0.12 and 
I see the exact same behavior in 4.0 where the CPU spikes during the streaming.

Is this a bug that 4.1 tries to utilize all available processors or is this the 
expectation that user should set 
{{-Dcassandra.streaming.session.parallelTransfers=1}} which was not required in 
4.0.x? If this is expected we should default it to 1 and let the user increase 
it as needed.

[~dcapwell] I am using Open Source Apache cassandra but to run some these 
tests, I have modified the source code and built it.


was (Author: jaid):
Please ignore the comment above:  this 
[commit|https://github.com/apache/cassandra/commit/5be1038c5d38af32d3cbb0545d867f21304f3a46]
 is probably exposing the problem and not introducing  it because not having 
this commit is making the overall streaming slower and causing the CPU 
consumption to be low. This can be clearly seen in the time it takes for the 
bootstrap (26 minutes vs 3 minutes). I have done multiple tests with different 
scenarios and reviewed the streaming code and here is what I see.
 * In 4.1 If the property `streaming.session.parallelTransfers` is not set the 
number of streaming sessions opens up max parallel thread.

{code:java}
Ref: org/apache/cassandra/streaming/async/StreamingMultiplexedChannel.java 
private static final int DEFAULT_MAX_PARALLEL_TRANSFERS = 
getAvailableProcessors();
private static final int MAX_PARALLEL_TRANSFERS = 
parseInt(getProperty(PROPERTY_PREFIX + "streaming.session.parallelTransfers", 
Integer.toString(DEFAULT_MAX_PARALLEL_TRANSFERS)));
<<truncated output>>
 fileTransferExecutor = executorFactory()
            .configurePooled("NettyStreaming-Outbound-" + name, 
MAX_PARALLEL_TRANSFERS)
            .withKeepAlive(1L, SECONDS).build();
} 
Ref: org/apache/cassandra/utils/FBUtilities.java
public static int getAvailableProcessors()
{
    if (availableProcessors > 0)
        return availableProcessors;
    else
        return Runtime.getRuntime().availableProcessors();
}


{code}
 * However, in 4.0 if the property `streaming.session.parallelTransfers` is not 
set the number of streaming sessions depends on the ThreadpoolExectuor which by 
default set to 1 and then allow the pool to grow. 

{code:java}
Ref: org/apache/cassandra/streaming/async/NettyStreamingMessageSender.java
fileTransferExecutor = new DebuggableThreadPoolExecutor(1, 
MAX_PARALLEL_TRANSFERS, 1L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(),
                                                        new 
NamedThreadFactory("NettyStreaming-Outbound-" + name));

Ref: ThreadPoolExecutor.class
/**
 * Creates a new {@code ThreadPoolExecutor} with the given initial
 * parameters and default rejected execution handler.
 *
 * @param corePoolSize the number of threads to keep in the pool, even
 *        if they are idle, unless {@code allowCoreThreadTimeOut} is set
 * @param maximumPoolSize the maximum number of threads to allow in the
 *        pool
 * @param keepAliveTime when the number of threads is greater than
 *        the core, this is the maximum time that excess idle threads
 *        will wait for new tasks before terminating.
 * @param unit the time unit for the {@code keepAliveTime} argument
 * @param workQueue the queue to use for holding tasks before they are
 *        executed.  This queue will hold only the {@code Runnable}
 *        tasks submitted by the {@code execute} method.
 * @param threadFactory the factory to use when the executor
 *        creates a new thread
 * @throws IllegalArgumentException if one of the following holds:<br>
 *         {@code corePoolSize < 0}<br>
 *         {@code keepAliveTime < 0}<br>
 *         {@code maximumPoolSize <= 0}<br>
 *         {@code maximumPoolSize < corePoolSize}
 * @throws NullPointerException if {@code workQueue}
 *         or {@code threadFactory} is null
 */
public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory) {
    this(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue,
         threadFactory, defaultHandler);
} 


{code}
 * This is evident from the DEBUG logs where in 4.1 we see the threads are 
ranging from `NettyStreaming-Outbound-/x.x.x.x.7000:1` to 
`NettyStreaming-Outbound-/x.x.x.x.7000:16` during the streaming, in the 4.0 
version they are always set to 1. `NettyStreaming-Outbound-/x.x.x.x.7000:1`

{code:java}
DEBUG [NettyStreaming-Outbound-/x.x.x.x.7000:11] 2025-01-31 05:36:32,393 
CassandraCompressedStreamWriter.java:62 - [Stream 
#0d829ec0-df95-11ef-9b5a-81b717c52836] Start streaming file 
/var/lib/cassandra/data/keyspace1/table1-212345705bc011e895a6cbf409872089/nb-16524-big-Data.db
 to /x.x.x.x:7000, repairedAt = 0, totalSize = 102241274 {code}
 * [This is the 
commit|https://github.com/apache/cassandra/commit/be1f050bc8c0cd695a42952e3fc84625ad48d83a#diff-1075097b8848b3055c2e17cdff45207c7889df8ec1eeae360a88f33a9168bc3f]
 which introduced this change the task execution in CASSANDRA-16925.
 * To prove this I did below two experiments and in both the cases the CPU 
spike is not observed similar to what I see earlier.
 ** set `-Dcassandra.streaming.session.parallelTransfers=1`
 ** compile the C* code with `fileTransferExecutor.setCorePoolSize(1);`
 * IIUC setting the `-Dcassandra.streaming.session.parallelTransfers=1` will 
always limit it to 1 and never let it grow, is that a correct way to handle 
this? as in 4.0.12 max is set number of cores, so the pool could grow.
 * Also I have set the property `streaming_connections_per_host: 1` which (I 
could be wrong but) looks like is not being honored in 4.1.X. I couldn't 
conclude this but just suspecting because 4.1 is creating 16 streaming 
sessions. 
 * I did another experiment to by adding 
`fileTransferExecutor.setCorePoolSize(MAX_PARALLEL_TRANSFERS);` in 4.0.12 and I 
see the exact same behavior in 4.0 where the CPU spikes during the streaming.

Is this a bug that 4.1 tries to utilize all available processors or is this the 
expectation that user should set 
`-Dcassandra.streaming.session.parallelTransfers=1` which was not required in 
4.0.x? If this is expected we should default it to 1 and let the user increase 
it as needed.

[~dcapwell] I am using Open Source Apache cassandra but to run some these 
tests, I have modified the source code and built it.

> CPU Spikes during the Streaming of data
> ---------------------------------------
>
>                 Key: CASSANDRA-20571
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20571
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Consistency/Streaming
>            Reporter: Jai Bheemsen Rao Dhanwada
>            Priority: Normal
>             Fix For: 4.1.x, 5.0.x, 5.x
>
>         Attachments: async_profiler_cpu.html
>
>
> Hello Team,
> We are seeing an issue where there is a huge spike in CPU on the node which 
> is streaming data (adding a new node or replacing a node or running a 
> nodetool rebuild). Essentially anytime when there is a Streaming is involved 
> the CPU spike is very huge. This does not happen in all the clusters but we 
> occasionally see this issue on specific cluster.
>  
> C* version: 4.1.6 (> 4.1.0)
> Schema: All the tables use counter data types.
> CPU Cores: 16
>  
> The same worksloads + clusters types do not show this behavior with the 4.0.x 
> version of cassandra, hence we suspect something changed in 4.1.6. Looking at 
> the top threads it's mostly the StreamDeserialize + compaction.
> {code:java}
> top - 17:01:29 up 18:42,  2 users,  load average: 51.75, 13.61, 4.79
> Threads: 741 total,  54 running, 687 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 91.5 us,  4.9 sy,  0.0 ni,  1.4 id,  0.7 wa,  1.1 hi,  0.4 si,  0.0 
> st
> MiB Mem :  31176.5 total,   8762.5 free,  11028.0 used,  11386.0 buff/cache
> MiB Swap:      0.0 total,      0.0 free,      0.0 used.  19334.3 avail Mem
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>  305763 xxxxxx    20   0   18.6g   9.8g 446524 R  30.8  32.2   0:04.69 
> Stream-Deserial
>  305815 xxxxxx    20   0   18.6g   9.8g 446524 R  28.6  32.2   0:04.81 
> Stream-Deserial
>  300600 xxxxxx    20   0   18.6g   9.8g 446524 R  27.9  32.2   0:04.73 
> CompactionExecu
>  305678 xxxxxx    20   0   18.6g   9.8g 446524 R  27.9  32.2   0:03.98 
> Stream-Deserial
>  305602 xxxxxx    20   0   18.6g   9.8g 446524 R  27.6  32.2   0:04.65 
> Stream-Deserial
>  305563 xxxxxx    20   0   18.6g   9.8g 446524 R  27.3  32.2   0:04.02 
> CompactionExecu
>  305687 xxxxxx    20   0   18.6g   9.8g 446524 R  26.9  32.2   0:04.28 
> Stream-Deserial
>  305707 xxxxxx    20   0   18.6g   9.8g 446524 S  26.9  32.2   0:04.29 
> Stream-Deserial
>  305714 xxxxxx    20   0   18.6g   9.8g 446524 R  26.9  32.2   0:04.91 
> Stream-Deserial
>  305569 xxxxxx    20   0   18.6g   9.8g 446524 R  26.6  32.2   0:05.69 
> Stream-Deserial
>  305771 xxxxxx    20   0   18.6g   9.8g 446524 R  26.6  32.2   0:03.99 
> Stream-Deserial
>  305817 xxxxxx    20   0   18.6g   9.8g 446524 R  26.3  32.2   0:03.79 
> Stream-Deserial
>  305566 xxxxxx    20   0   18.6g   9.8g 446524 R  26.0  32.2   0:04.64 
> CompactionExecu {code}
> Initial Hypothesis was if streaming_stats are playing a role here based on: 
> https://issues.apache.org/jira/browse/CASSANDRA-18110. However we turned the 
> streaming_stats: false and still see a spike in CPU. Post the streaming is 
> complete the cluster is back to normal state where we don't see a spike in 
> CPU but we would like to understand what's causing the huge CPU spikes. I 
> have profiler attached during the time of CPU.
> Please let me know if you need any other details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to