[ 
https://issues.apache.org/jira/browse/CASSANDRA-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19979:
-----------------------------------
    Description: 
CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
increase throughput while reducing IOPS.  We can do the same thing with our 
streaming slow path.  There's a common misconception that the overhead comes 
from serde overhead, but I've found on a lot of devices the overhead is due to 
our read patterns. This is most commonly found on non-NVMe drives, especially 
disaggregated storage such as EBS where the latency is higher and more variable.

Attached is a perf profile showing the cost of streaming is dominated by pread. 
 The team I was working with was seeing they could stream only 12MB per 
streaming session.  Reducing the number of read operations by using internal 
buffered reads should improve this by at least 3-5x, as well as reduce CPU 
overhead from reduced system calls.

I think we need to do a few things:
 * Internal buffer on reads.  Maybe something like adding `withBuffer()` on 
ChannelProxy, which would wrap it with a BufferedReader
 * Buffer writes to the network.  Writing constant small values to the network 
has a very high latency cost, we'd be better off flushing larger values more 
often
 * Move the blocking network part to a separate thread.  We don't need to wait 
on the network transfer in order to read more data off disk.  Once we improve 
the internal buffer on reads I think we'll see this as the next problem so 
let's tackle it now.

 

 

 

 

 

 

 

 

!image-2024-10-04-12-40-26-727.png!

  was:
CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
increase throughput while reducing IOPS.  We can do the same thing with our 
streaming slow path.  There's a common misconception that the overhead comes 
from serde overhead, but I've found on a lot of devices the overhead is due to 
our read patterns. This is most commonly found on non-NVMe drives, especially 
disaggregated storage such as EBS where the latency is higher and more variable.

Attached is a perf profile showing the cost of streaming is dominated by pread. 
 The team I was working with was seeing they could stream only 12MB per 
streaming session.  Reducing the number of read operations by using internal 
buffered reads should improve this by at least 3-5x, as well as reduce CPU 
overhead from reduced system calls.

 

 

 

 

 

!image-2024-10-04-12-40-26-727.png!


> Use internal buffer on streaming slow path
> ------------------------------------------
>
>                 Key: CASSANDRA-19979
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19979
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>            Priority: Normal
>         Attachments: image-2024-10-04-12-40-26-727.png
>
>
> CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
> increase throughput while reducing IOPS.  We can do the same thing with our 
> streaming slow path.  There's a common misconception that the overhead comes 
> from serde overhead, but I've found on a lot of devices the overhead is due 
> to our read patterns. This is most commonly found on non-NVMe drives, 
> especially disaggregated storage such as EBS where the latency is higher and 
> more variable.
> Attached is a perf profile showing the cost of streaming is dominated by 
> pread.  The team I was working with was seeing they could stream only 12MB 
> per streaming session.  Reducing the number of read operations by using 
> internal buffered reads should improve this by at least 3-5x, as well as 
> reduce CPU overhead from reduced system calls.
> I think we need to do a few things:
>  * Internal buffer on reads.  Maybe something like adding `withBuffer()` on 
> ChannelProxy, which would wrap it with a BufferedReader
>  * Buffer writes to the network.  Writing constant small values to the 
> network has a very high latency cost, we'd be better off flushing larger 
> values more often
>  * Move the blocking network part to a separate thread.  We don't need to 
> wait on the network transfer in order to read more data off disk.  Once we 
> improve the internal buffer on reads I think we'll see this as the next 
> problem so let's tackle it now.
>  
>  
>  
>  
>  
>  
>  
>  
> !image-2024-10-04-12-40-26-727.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to