[
https://issues.apache.org/jira/browse/CASSANDRA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855711#comment-15855711
]
Benjamin Lerer commented on CASSANDRA-13120:
--------------------------------------------
The trace and histogram output are effectively wrong. The problem is caused by
the fact that the code within {{SinglePartitionReadCommand}} is not aware of
which SSTables are skipped due to the Bloom filters.
I will try to find a way to correct that problem.
> Trace and Histogram output misleading
> -------------------------------------
>
> Key: CASSANDRA-13120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13120
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Adam Hattrell
> Assignee: Benjamin Lerer
> Priority: Minor
>
> If we look at the following output:
> {noformat}
> [centos@cassandra-c-3]$ nodetool getsstables -- keyspace table
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647146-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647147-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647145-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647152-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647157-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-648137-big-Data.db
> {noformat}
> We can see that this key value appears in just 6 sstables. However, when we
> run a select against the table and key we get:
> {noformat}
> Tracing session: a6c81330-d670-11e6-b00b-c1d403fd6e84
> activity
> | timestamp | source
> | source_elapsed
> -------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
>
> Execute CQL3 query | 2017-01-09 13:36:40.419000 |
> 10.200.254.141 | 0
> Parsing SELECT * FROM keyspace.table WHERE id =
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10; [SharedPool-Worker-2] |
> 2017-01-09 13:36:40.419000 | 10.200.254.141 | 104
>
> Preparing statement [SharedPool-Worker-2] | 2017-01-09 13:36:40.419000 |
> 10.200.254.141 | 220
> Executing single-partition query on
> table [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 |
> 10.200.254.141 | 450
> Acquiring
> sstable references [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 |
> 10.200.254.141 | 477
> Bloom filter allows skipping
> sstable 648146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 |
> 10.200.254.141 | 496
> Bloom filter allows skipping
> sstable 648145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 503
> Key cache hit for
> sstable 648140 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 513
> Bloom filter allows skipping
> sstable 648135 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 520
> Bloom filter allows skipping
> sstable 648130 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 526
> Bloom filter allows skipping
> sstable 648048 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 530
> Bloom filter allows skipping
> sstable 647749 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 535
> Bloom filter allows skipping
> sstable 647404 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 540
> Key cache hit for
> sstable 647145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 548
> Key cache hit for
> sstable 647146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 |
> 10.200.254.141 | 556
> Key cache hit for
> sstable 647147 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 564
> Bloom filter allows skipping
> sstable 647148 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 570
> Bloom filter allows skipping
> sstable 647149 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 575
> Bloom filter allows skipping
> sstable 647150 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 580
> Bloom filter allows skipping
> sstable 647151 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 585
> Key cache hit for
> sstable 647152 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 591
> Bloom filter allows skipping
> sstable 647153 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 597
> Bloom filter allows skipping
> sstable 647154 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 601
> Bloom filter allows skipping
> sstable 647155 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 606
> Bloom filter allows skipping
> sstable 647156 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 611
> Key cache hit for
> sstable 647157 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 |
> 10.200.254.141 | 617
> Bloom filter allows skipping
> sstable 647158 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 |
> 10.200.254.141 | 623
> Skipped 0/22 non-slice-intersecting sstables, included 0
> due to tombstones [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 |
> 10.200.254.141 | 644
> Merging data from memtables
> and 22 sstables [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 |
> 10.200.254.141 | 654
> Read 9 live and 0
> tombstone cells [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 |
> 10.200.254.141 | 732
>
> Request complete | 2017-01-09 13:36:40.419808 |
> 10.200.254.141 | 808
> {noformat}
> You'll note we claim not to have skipped any files due to bloom filters -
> even though we know the data is only in 6 files.
> CFHistograms also report that we're hitting every sstables:
> {noformat}
> Percentile SSTables Write Latency Read Latency Partition Size Cell Count
> (micros) (micros) (bytes)
> 50% 24.00 14.24 182.79 103 1
> 75% 24.00 17.08 315.85 149 2
> 95% 24.00 20.50 7007.51 372 7
> 98% 24.00 24.60 10090.81 642 12
> 99% 24.00 29.52 12108.97 770 14
> Min 21.00 3.31 29.52 43 0
> Max 29.00 1358.10 62479.63 1597 35
> {noformat}
> Code for the read is here:
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L561
> We seem to iterate over all the sstables and increment the metric as part of
> that iteration.
> Either the reporting is incorrect - or we should maybe check the bloom
> filters first and then iterate the tombstones after?
> In this particular case we were using TWCS which makes the problem more
> apparent. TWCS guarantees that we'll keep more sstables in an un-merged
> state. With STCS we have to search them all, but most of them should be
> merged together if Compaction is keeping up. LCS the read path is restricted
> which will disguise the impact.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)