[
https://issues.apache.org/jira/browse/CASSANDRA-19494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jon Haddad updated CASSANDRA-19494:
-----------------------------------
Resolution: Duplicate
Status: Resolved (was: Triage Needed)
Will be resolved as part of CASSANDRA-15452, very exciting.
> Optimize I/O during table scans
> -------------------------------
>
> Key: CASSANDRA-19494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19494
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jon Haddad
> Priority: Normal
> Attachments: reads.txt
>
>
> The storage engine reads chunk by chunk during table scans. We'd be much
> better off if we could perform larger I/O operations to an internal buffer,
> perform fewer I/O operations, and avoid making excessive system calls.
> For example, doing a scan against this table:
> {noformat}
> CREATE TABLE easy_cass_stress.keyvalue (
> key text PRIMARY KEY,
> value text
> ) WITH additional_write_policy = '99p'
> AND allow_auto_snapshot = true
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '16', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND memtable = 'default'
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND incremental_backups = true
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';{noformat}
> I see the following I/O activity (sample only, see attachment for full
> accounting of all reads)
>
> {noformat}
> TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
> 16:59:23 ReadStage-2 2523 R 15051 0 0.02 nb-6-big-Data.db
> 16:59:23 ReadStage-2 2523 R 15049 0 0.01 nb-8-big-Data.db
> 16:59:23 ReadStage-2 2523 R 15025 0 0.01 nb-5-big-Data.db
> 16:59:23 ReadStage-2 2523 R 15064 0 0.01 nb-7-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15051 0 0.01 nb-6-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15049 0 0.01 nb-8-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15025 0 0.01 nb-5-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15064 0 0.00 nb-7-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15064 14 0.01 nb-5-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15051 0 0.01 nb-6-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15049 0 0.00 nb-8-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15064 14 0.00 nb-5-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15064 0 0.00 nb-7-big-Data.db
> 16:59:25 ReadStage-2 2523 R 15012 29 0.01
> nb-5-big-Data.db{noformat}
> with a sample of our off-cpu time looking like this (after dropping caches)
> {noformat}
> cpudist -O -p $(cassandra-pid) -m 1 30
> msecs : count distribution
> 0 -> 1 : 5259 |****************************************|
> 2 -> 3 : 486 |*** |
> 4 -> 7 : 0 | |
> 8 -> 15 : 1 | |
> 16 -> 31 : 0 | |
> 32 -> 63 : 29 | |
> 64 -> 127 : 77 | |
> 128 -> 255 : 4 | |
> 256 -> 511 : 6 | |
> 512 -> 1023 : 6 |
> |{noformat}
> We pay a pretty serious throughput penalty for excessive I/O.
> We should be able to leverage the work in CASSANDRA-15452 for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]