[ 
https://issues.apache.org/jira/browse/CASSANDRA-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19979:
-----------------------------------
    Description: 
CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
increase throughput while reducing IOPS.  We do the same thing with our 
streaming slow path, although it's not optimal.  There's a common misconception 
that the overhead comes from serde overhead, but I've found on a lot of devices 
the overhead is due to our read patterns. This is most commonly found on 
non-NVMe drives, especially disaggregated storage such as EBS where the latency 
is higher and more variable.

Attached is a perf profile showing the cost of streaming is dominated by pread. 
 The team I was working with was seeing they could stream only 12MB per 
streaming session.  Reducing the number of read operations by using larger 
buffered reads should improve this by at least 3-5x on some systems, as well as 
reduce CPU overhead from reduced system calls.

I think we need to do a few things:
 * Use a larger internal buffer on disk reads. 
 * Buffer writes to the network.  Writing constant small values to the network 
has a very high latency cost, we'd be better off flushing larger values more 
often
 * Move the blocking network writing to a separate thread.  We don't need to 
wait on the network transfer in order to read more data off disk.  Once we 
improve the internal buffer on reads I think we'll see this as the next problem 
so let's tackle it now.  ExecutorService.newSingleThreadExecutor()  would work 
well for this.

 

 

 

 

 

 

 

 

!image-2024-10-04-12-40-26-727.png!

  was:
CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
increase throughput while reducing IOPS.  We do the same thing with our 
streaming slow path, although it's not optimal.  There's a common misconception 
that the overhead comes from serde overhead, but I've found on a lot of devices 
the overhead is due to our read patterns. This is most commonly found on 
non-NVMe drives, especially disaggregated storage such as EBS where the latency 
is higher and more variable.

Attached is a perf profile showing the cost of streaming is dominated by pread. 
 The team I was working with was seeing they could stream only 12MB per 
streaming session.  Reducing the number of read operations by using larger 
buffered reads should improve this by at least 3-5x on some systems, as well as 
reduce CPU overhead from reduced system calls.

I think we need to do a few things:
 * Use a larger internal buffer on disk reads. 
 * Buffer writes to the network.  Writing constant small values to the network 
has a very high latency cost, we'd be better off flushing larger values more 
often
 * Move the blocking network part to a separate thread.  We don't need to wait 
on the network transfer in order to read more data off disk.  Once we improve 
the internal buffer on reads I think we'll see this as the next problem so 
let's tackle it now.

 

 

 

 

 

 

 

 

!image-2024-10-04-12-40-26-727.png!


> Use larger internal buffers on streaming slow path
> --------------------------------------------------
>
>                 Key: CASSANDRA-19979
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19979
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>            Priority: Normal
>         Attachments: image-2024-10-04-12-40-26-727.png
>
>
> CASSANDRA-15452 is introducing an internal buffer to compaction in order to 
> increase throughput while reducing IOPS.  We do the same thing with our 
> streaming slow path, although it's not optimal.  There's a common 
> misconception that the overhead comes from serde overhead, but I've found on 
> a lot of devices the overhead is due to our read patterns. This is most 
> commonly found on non-NVMe drives, especially disaggregated storage such as 
> EBS where the latency is higher and more variable.
> Attached is a perf profile showing the cost of streaming is dominated by 
> pread.  The team I was working with was seeing they could stream only 12MB 
> per streaming session.  Reducing the number of read operations by using 
> larger buffered reads should improve this by at least 3-5x on some systems, 
> as well as reduce CPU overhead from reduced system calls.
> I think we need to do a few things:
>  * Use a larger internal buffer on disk reads. 
>  * Buffer writes to the network.  Writing constant small values to the 
> network has a very high latency cost, we'd be better off flushing larger 
> values more often
>  * Move the blocking network writing to a separate thread.  We don't need to 
> wait on the network transfer in order to read more data off disk.  Once we 
> improve the internal buffer on reads I think we'll see this as the next 
> problem so let's tackle it now.  ExecutorService.newSingleThreadExecutor()  
> would work well for this.
>  
>  
>  
>  
>  
>  
>  
>  
> !image-2024-10-04-12-40-26-727.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to