[
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487564#comment-14487564
]
Tyler Hobbs commented on CASSANDRA-8938:
----------------------------------------
[~eanujwa] so, there are two separate issues here: the read count and latency
metrics (what you see in cfstats), and the hotness measurements for sstables.
We don't have to update them the same way.
Regarding metrics, I would be okay with having separate range scan count and
latency metrics. We need to decide exactly those metrics behave, though (e.g.
increment the read count for each full scan, or each partition scanned, or each
row scanned?).
For the hotness measurements, I do _not_ think we should increment the read
count for each row (or even partition) in a scan. After the removal of
{{cold_reads_to_omit}} in CASSANDRA-8860, the hotness measurements do two
things: prioritize compaction of certain sstables when there are multiple
sstable sets that can be compacted, and determine the amount of space to
allocate for the index summary for an sstable. Since the index summary is far
more important for partition reads than scans, I think we can agree that scans
shouldn't have a big impact on this. For prioritizing compaction, the absolute
read numbers don't matter, only how large they are relative to each other. So,
incrementing the count by one for each scan should be sufficient to handle a
scan-only workload. If the workload is mixed, I think it's okay if partition
reads have a greater influence on compaction prioritization than range scans do.
> Full Row Scan does not count towards Reads
> ------------------------------------------
>
> Key: CASSANDRA-8938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
> Project: Cassandra
> Issue Type: Bug
> Components: API, Core, Tools
> Environment: Unix, Cassandra 2.0.3
> Reporter: Amit Singh Chowdhery
> Assignee: Marcus Eriksson
> Priority: Minor
> Labels: none
>
> When a CQL SELECT statement is executed with WHERE clause, Read Count is
> incremented in cfstats of the column family. But, when a full row scan is
> done using SELECT statement without WHERE clause, Read Count is not
> incremented.
> Similarly, when using Size Tiered Compaction, if we do a full row scan using
> Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra
> still considers all sstables as cold and does not trigger compaction for
> them. If we fire MultigetSliceQuery, Read Count is incremented and sstables
> becomes hot, triggering compaction of these sstables.
> Expected Behavior:
> 1. Read Count must be incremented by number of rows read during a full row
> scan done using CQL SELECT statement or Hector RangeslicesQuery.
> 2. Size Tiered compaction must consider all sstables as Hot after a full row
> scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)