[ 
https://issues.apache.org/jira/browse/CASSANDRA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855844#comment-15855844
 ] 

Nenad edited comment on CASSANDRA-13120 at 2/7/17 12:04 PM:
------------------------------------------------------------

Basically counter is incremented (_sstablesIterated_) on each loop no matter if 
Bloom Filter says it should be skipped or not. I was waiting for someone to 
check if that is intended or not.

I have suggestion how we can fix that (I can create a patch as well). Out of 
all sstables candidates for a file, only sstables which return non null from 
[BigTableReader| 
https://github.com/apache/cassandra/blob/af3fe39dcabd9ef77a00309ce6741268423206df/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java]
 will add up to result. Read path and this code is executed from [this 
line|https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L584].

We can introduce counter _sstablesWithData_ if _sstablesIterated_ is used 
somewhere else, and increment _sstablesWithData_ only when _iter_ has non-null 
result returned from BigTableReader.

What do you think? That would give correct number of sstables based on read 
path including Bloom Filter.






was (Author: nbozicns):
Basically counter is incremented ({noformat}sstablesIterated{noformat}) on each 
loop no matter if Bloom Filter says it should be skipped or not. I was waiting 
for someone to check if that is intended or not.

I have suggestion how we can fix that (I can create a patch as well). Out of 
all sstables candidates for a file, only sstables which return non null from 
BigTableReader 
(https://github.com/apache/cassandra/blob/af3fe39dcabd9ef77a00309ce6741268423206df/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java)
 will add up to result. Read path and this code is executed from this line 
(https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L584).

We can introduce counter {noformat}sstablesWithData{noformat} if 
{noformat}sstablesIterated{noformat} is used somewhere else, and increment 
{noformat}sstablesWithData{noformat} only when {noformat}iter{noformat} has 
non-null result returned from BigTableReader.

What do you think? That would give correct number of sstables based on read 
path including Bloom Filter.





> Trace and Histogram output misleading
> -------------------------------------
>
>                 Key: CASSANDRA-13120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13120
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Adam Hattrell
>            Assignee: Benjamin Lerer
>            Priority: Minor
>
> If we look at the following output:
> {noformat}
> [centos@cassandra-c-3]$ nodetool getsstables -- keyspace table 
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647146-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647147-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647145-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647152-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647157-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-648137-big-Data.db
> {noformat}
> We can see that this key value appears in just 6 sstables.  However, when we 
> run a select against the table and key we get:
> {noformat}
> Tracing session: a6c81330-d670-11e6-b00b-c1d403fd6e84
>  activity                                                                     
>                                      | timestamp                  | source    
>      | source_elapsed
> -------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
>                                                                               
>                   Execute CQL3 query | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |              0
>  Parsing SELECT * FROM keyspace.table WHERE id = 
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10; [SharedPool-Worker-2]       | 
> 2017-01-09 13:36:40.419000 | 10.200.254.141 |            104
>                                                                          
> Preparing statement [SharedPool-Worker-2] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |            220
>                                         Executing single-partition query on 
> table [SharedPool-Worker-1]            | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |            450
>                                                                 Acquiring 
> sstable references [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |            477
>                                                  Bloom filter allows skipping 
> sstable 648146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |            496
>                                                  Bloom filter allows skipping 
> sstable 648145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            503
>                                                             Key cache hit for 
> sstable 648140 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            513
>                                                  Bloom filter allows skipping 
> sstable 648135 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            520
>                                                  Bloom filter allows skipping 
> sstable 648130 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            526
>                                                  Bloom filter allows skipping 
> sstable 648048 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            530
>                                                  Bloom filter allows skipping 
> sstable 647749 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            535
>                                                  Bloom filter allows skipping 
> sstable 647404 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            540
>                                                             Key cache hit for 
> sstable 647145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            548
>                                                             Key cache hit for 
> sstable 647146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |            556
>                                                             Key cache hit for 
> sstable 647147 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            564
>                                                  Bloom filter allows skipping 
> sstable 647148 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            570
>                                                  Bloom filter allows skipping 
> sstable 647149 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            575
>                                                  Bloom filter allows skipping 
> sstable 647150 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            580
>                                                  Bloom filter allows skipping 
> sstable 647151 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            585
>                                                             Key cache hit for 
> sstable 647152 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            591
>                                                  Bloom filter allows skipping 
> sstable 647153 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            597
>                                                  Bloom filter allows skipping 
> sstable 647154 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            601
>                                                  Bloom filter allows skipping 
> sstable 647155 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            606
>                                                  Bloom filter allows skipping 
> sstable 647156 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            611
>                                                             Key cache hit for 
> sstable 647157 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 
> 10.200.254.141 |            617
>                                                  Bloom filter allows skipping 
> sstable 647158 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 
> 10.200.254.141 |            623
>                   Skipped 0/22 non-slice-intersecting sstables, included 0 
> due to tombstones [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 
> 10.200.254.141 |            644
>                                                  Merging data from memtables 
> and 22 sstables [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 
> 10.200.254.141 |            654
>                                                            Read 9 live and 0 
> tombstone cells [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 
> 10.200.254.141 |            732
>                                                                               
>                     Request complete | 2017-01-09 13:36:40.419808 | 
> 10.200.254.141 |            808
> {noformat}
> You'll note we claim not to have skipped any files due to bloom filters - 
> even though we know the data is only in 6 files.
> CFHistograms also report that we're hitting every sstables:
> {noformat}
> Percentile SSTables Write Latency Read Latency Partition Size Cell Count 
> (micros) (micros) (bytes) 
> 50% 24.00 14.24 182.79 103 1 
> 75% 24.00 17.08 315.85 149 2 
> 95% 24.00 20.50 7007.51 372 7 
> 98% 24.00 24.60 10090.81 642 12 
> 99% 24.00 29.52 12108.97 770 14 
> Min 21.00 3.31 29.52 43 0 
> Max 29.00 1358.10 62479.63 1597 35
> {noformat}
> Code for the read is here:
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L561
> We seem to iterate over all the sstables and increment the metric as part of 
> that iteration.
> Either the reporting is incorrect - or we should maybe check the bloom 
> filters first and then iterate the tombstones after?  
> In this particular case we were using TWCS which makes the problem more 
> apparent.  TWCS guarantees that we'll keep more sstables in an un-merged 
> state.  With STCS we have to search them all, but most of them should be 
> merged together if Compaction is keeping up.  LCS the read path is restricted 
> which will disguise the impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to