What I understand is that there will be some differences in block storage
among various cloud platforms. More intuitively, the default read-ahead
size will be the same. For example, AWS ebs seems to be 256K, and Alibaba
Cloud seems to be 512K(If I remember correctly).

Just like 19488, give the test method, see who can assist in the test , and
provide the results.

Jon Haddad <j...@rustyrazorblade.com> 于2025年2月13日周四 08:30写道:

> Can you elaborate why?  This would be several hundred hours of work and
> would cost me thousands of $$ to perform.
>
> Filesystems and block devices are well understood.  Could you give me an
> example of what you think might be different here?  This is already one of
> the most well tested and documented performance patches ever contributed to
> the project.
>
> On Wed, Feb 12, 2025 at 4:26 PM guo Maxwell <cclive1...@gmail.com> wrote:
>
>>  I think it should be tested on most cloud platforms(at least
>> aws、azure、gcp) before merged into 5.0 . Just like  CASSANDRA-19488.
>>
>> Paulo Motta <pa...@apache.org>于2025年2月13日 周四上午6:10写道:
>>
>>> I'm looking forward to these improvements, compaction needs tlc. :-)
>>> A couple of questions:
>>>
>>> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My
>>> only concern is if this is an optimization for EBS that can be a
>>> deoptimization for other environments.
>>>
>>> Are there reproducible scripts that anyone can run to verify the
>>> improvements in their own environments ? This could help alleviate any
>>> concerns and gain confidence to introduce a perf. improvement in a
>>> patch release.
>>>
>>> I have not read the ticket in detail, so apologies if this was already
>>> discussed there or elsewhere.
>>>
>>> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <j...@rustyrazorblade.com>
>>> wrote:
>>> >
>>> > Hey folks,
>>> >
>>> > Over the last 9 months Jordan and I have worked on CASSANDRA-15452
>>> [1].  The TL;DR is that we're internalizing a read ahead buffer to allow us
>>> to do fewer requests to disk during compaction and range reads.  This
>>> results in far fewer system calls (roughly 16x reduction) and on systems
>>> with higher read latency, a significant improvement in compaction
>>> throughput.  We've tested several different EBS configurations and found it
>>> delivers up to a 10x improvement when read ahead is optimized to minimize
>>> read latency.  I worked with AWS and the EBS team directly on this and the
>>> Best Practices for C* on EBS [2] I wrote for them.  I've performance tested
>>> this patch extensively with hundreds of billions of operations across
>>> several clusters and thousands of compactions.  It has less of an impact on
>>> local NVMe, since the p99 latency is already 10-30x less than what you see
>>> on EBS (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS
>>> vs a max of 16K.
>>> >
>>> > Related to this, Branimir wrote CASSANDRA-20092 [3], which
>>> significantly improves compaction by avoiding reading the partition index.
>>> CASSANDRA-20092 has been merged to trunk already [4].
>>> >
>>> > I think we should merge both of these patches into 5.0, as the perf
>>> improvement should allow teams to increase density of EBS backed C*
>>> clusters by 2-5x, driving cost way down.  There's a lot of teams running C*
>>> on EBS now.  I'm currently working with one that's bottlenecked on maxed
>>> out EBS GP3 storage.  I propose we merge both, because without
>>> CASSANDRA-20092, we won't get the performance improvements in
>>> CASSANDRA-15452 with BTI, only BIG format.  I've tested BTI in other
>>> situations and found it to be far more performant than BIG.
>>> >
>>> > If we were looking at a small win, I wouldn't care much, but since
>>> these patches, combined with UCS, allows more teams to run C* on EBS at >
>>> 10TB / node, I think it's worth doing now.
>>> >
>>> > Thanks in advance,
>>> > Jon
>>> >
>>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
>>> > [2]
>>> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/
>>> > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092
>>> > [4]
>>> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
>>> >
>>>
>>

Reply via email to