[
https://issues.apache.org/jira/browse/CASSANDRA-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jon Haddad updated CASSANDRA-19979:
-----------------------------------
Description:
CASSANDRA-15452 is introducing an internal buffer to compaction in order to
increase throughput while reducing IOPS. We do the same thing with our
streaming slow path, although it's not optimal. There's a common misconception
that the overhead comes from serde overhead, but I've found on a lot of devices
the overhead is due to our read patterns. This is most commonly found on
non-NVMe drives, especially disaggregated storage such as EBS where the latency
is higher and more variable.
Attached is a perf profile showing the cost of streaming is dominated by pread.
The team I was working with was seeing they could stream only 12MB per
streaming session. Reducing the number of read operations by using larger
buffered reads should improve this by at least 3-5x on some systems, as well as
reduce CPU overhead from reduced system calls.
I think we need to do a few things:
* Use a larger internal buffer on disk reads.
* Buffer writes to the network. Writing constant small values to the network
has a very high latency cost, we'd be better off flushing larger values more
often
* Move the blocking network writing to a separate thread. We don't need to
wait on the network transfer in order to read more data off disk. Once we
improve the internal buffer on reads I think we'll see this as the next problem
so let's tackle it now. ExecutorService.newSingleThreadExecutor() would work
well for this.
!image-2024-10-04-12-40-26-727.png|width=912,height=592!
was:
CASSANDRA-15452 is introducing an internal buffer to compaction in order to
increase throughput while reducing IOPS. We do the same thing with our
streaming slow path, although it's not optimal. There's a common misconception
that the overhead comes from serde overhead, but I've found on a lot of devices
the overhead is due to our read patterns. This is most commonly found on
non-NVMe drives, especially disaggregated storage such as EBS where the latency
is higher and more variable.
Attached is a perf profile showing the cost of streaming is dominated by pread.
The team I was working with was seeing they could stream only 12MB per
streaming session. Reducing the number of read operations by using larger
buffered reads should improve this by at least 3-5x on some systems, as well as
reduce CPU overhead from reduced system calls.
I think we need to do a few things:
* Use a larger internal buffer on disk reads.
* Buffer writes to the network. Writing constant small values to the network
has a very high latency cost, we'd be better off flushing larger values more
often
* Move the blocking network writing to a separate thread. We don't need to
wait on the network transfer in order to read more data off disk. Once we
improve the internal buffer on reads I think we'll see this as the next problem
so let's tackle it now. ExecutorService.newSingleThreadExecutor() would work
well for this.
!image-2024-10-04-12-40-26-727.png!
> Use larger internal buffers on streaming slow path
> --------------------------------------------------
>
> Key: CASSANDRA-19979
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19979
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jon Haddad
> Priority: Normal
> Attachments: image-2024-10-04-12-40-26-727.png
>
>
> CASSANDRA-15452 is introducing an internal buffer to compaction in order to
> increase throughput while reducing IOPS. We do the same thing with our
> streaming slow path, although it's not optimal. There's a common
> misconception that the overhead comes from serde overhead, but I've found on
> a lot of devices the overhead is due to our read patterns. This is most
> commonly found on non-NVMe drives, especially disaggregated storage such as
> EBS where the latency is higher and more variable.
> Attached is a perf profile showing the cost of streaming is dominated by
> pread. The team I was working with was seeing they could stream only 12MB
> per streaming session. Reducing the number of read operations by using
> larger buffered reads should improve this by at least 3-5x on some systems,
> as well as reduce CPU overhead from reduced system calls.
> I think we need to do a few things:
> * Use a larger internal buffer on disk reads.
> * Buffer writes to the network. Writing constant small values to the
> network has a very high latency cost, we'd be better off flushing larger
> values more often
> * Move the blocking network writing to a separate thread. We don't need to
> wait on the network transfer in order to read more data off disk. Once we
> improve the internal buffer on reads I think we'll see this as the next
> problem so let's tackle it now. ExecutorService.newSingleThreadExecutor()
> would work well for this.
>
>
>
>
>
>
>
>
> !image-2024-10-04-12-40-26-727.png|width=912,height=592!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]