What I understand is that there will be some differences in block storage among various cloud platforms. More intuitively, the default read-ahead size will be the same. For example, AWS ebs seems to be 256K, and Alibaba Cloud seems to be 512K(If I remember correctly).
Just like 19488, give the test method, see who can assist in the test , and provide the results. Jon Haddad <j...@rustyrazorblade.com> 于2025年2月13日周四 08:30写道: > Can you elaborate why? This would be several hundred hours of work and > would cost me thousands of $$ to perform. > > Filesystems and block devices are well understood. Could you give me an > example of what you think might be different here? This is already one of > the most well tested and documented performance patches ever contributed to > the project. > > On Wed, Feb 12, 2025 at 4:26 PM guo Maxwell <cclive1...@gmail.com> wrote: > >> I think it should be tested on most cloud platforms(at least >> aws、azure、gcp) before merged into 5.0 . Just like CASSANDRA-19488. >> >> Paulo Motta <pa...@apache.org>于2025年2月13日 周四上午6:10写道: >> >>> I'm looking forward to these improvements, compaction needs tlc. :-) >>> A couple of questions: >>> >>> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My >>> only concern is if this is an optimization for EBS that can be a >>> deoptimization for other environments. >>> >>> Are there reproducible scripts that anyone can run to verify the >>> improvements in their own environments ? This could help alleviate any >>> concerns and gain confidence to introduce a perf. improvement in a >>> patch release. >>> >>> I have not read the ticket in detail, so apologies if this was already >>> discussed there or elsewhere. >>> >>> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <j...@rustyrazorblade.com> >>> wrote: >>> > >>> > Hey folks, >>> > >>> > Over the last 9 months Jordan and I have worked on CASSANDRA-15452 >>> [1]. The TL;DR is that we're internalizing a read ahead buffer to allow us >>> to do fewer requests to disk during compaction and range reads. This >>> results in far fewer system calls (roughly 16x reduction) and on systems >>> with higher read latency, a significant improvement in compaction >>> throughput. We've tested several different EBS configurations and found it >>> delivers up to a 10x improvement when read ahead is optimized to minimize >>> read latency. I worked with AWS and the EBS team directly on this and the >>> Best Practices for C* on EBS [2] I wrote for them. I've performance tested >>> this patch extensively with hundreds of billions of operations across >>> several clusters and thousands of compactions. It has less of an impact on >>> local NVMe, since the p99 latency is already 10-30x less than what you see >>> on EBS (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS >>> vs a max of 16K. >>> > >>> > Related to this, Branimir wrote CASSANDRA-20092 [3], which >>> significantly improves compaction by avoiding reading the partition index. >>> CASSANDRA-20092 has been merged to trunk already [4]. >>> > >>> > I think we should merge both of these patches into 5.0, as the perf >>> improvement should allow teams to increase density of EBS backed C* >>> clusters by 2-5x, driving cost way down. There's a lot of teams running C* >>> on EBS now. I'm currently working with one that's bottlenecked on maxed >>> out EBS GP3 storage. I propose we merge both, because without >>> CASSANDRA-20092, we won't get the performance improvements in >>> CASSANDRA-15452 with BTI, only BIG format. I've tested BTI in other >>> situations and found it to be far more performant than BIG. >>> > >>> > If we were looking at a small win, I wouldn't care much, but since >>> these patches, combined with UCS, allows more teams to run C* on EBS at > >>> 10TB / node, I think it's worth doing now. >>> > >>> > Thanks in advance, >>> > Jon >>> > >>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452 >>> > [2] >>> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/ >>> > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092 >>> > [4] >>> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f >>> > >>> >>