[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729429#comment-14729429 ]
Matt Stump commented on CASSANDRA-10249: ---------------------------------------- +1 on configurable > Reduce over-read for standard disk io by 16x > -------------------------------------------- > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement > Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: patched-2.1.9-dstat-lvn10.png, > stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)