[
https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110307#comment-15110307
]
Branimir Lambov commented on CASSANDRA-8180:
--------------------------------------------
bq. What is an example of any other incomplete prefix and do we have a gap in
the tests then?
Tombstones. A {{DELETE WHERE pk = ? AND ck1 = ?}} in a table with key {{(pk,
ck1, ck2)}} will generate one.
bq. What I don't understand is how things like shouldInclude() in
ClusteringIndexNamesFilter or ClusteringIndexSliceFilter work.
If you look at the callsites for the method, you will see that they do more
work in the presence of tombstones. So one solution is not to use the
{{min/maxClusteringValues}} in that case.
bq. \[MetadataSerializer.deserialize()\] should receive the total size to work
out if there is more stuff to read at the end.
No need for that, you can set a flag in {{Version}} to tell you whether or not
the information is present.
bq. I'm not sure what you mean, \[use a RangeTombstoneBound\] for the test or
the fix?
This is the fix. Instead of an empty row, the lower bound should be a
{{RangeTombstoneBound}} as described.
bq. The global lower bound is free, since it is available in the metadata. The
index lower bound is more accurate but it requires seeking the index file.
In the way you use this class, by the time {{lowerBound()}} is called, all of
this is already done (by {{UnfilteredRowMergeIterator.create}}), possibly
unnecessarily (if {{MergeIterator.OneToOne}} is to be used). I would just move
finding the bound to {{lowerBound()}}, and I don't think it's even necessary to
save the bound-- just retrieve it there, the method won't be called more than
once.
> Optimize disk seek using min/max column name meta data when the LIMIT clause
> is used
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths
> Environment: Cassandra 2.0.10
> Reporter: DOAN DuyHai
> Assignee: Stefania
> Priority: Minor
> Fix For: 3.x
>
> Attachments: 8180_001.yaml, 8180_002.yaml
>
>
> I was working on an example of sensor data table (timeseries) and face a use
> case where C* does not optimize read on disk.
> {code}
> cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col))
> WITH CLUSTERING ORDER BY (col DESC);
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
> ...
> >nodetool flush test test
> {code}
> After that, I activate request tracing:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1;
> activity |
> timestamp | source | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
> execute_cql3_query |
> 23:48:46,498 | 127.0.0.1 | 0
> Parsing SELECT * FROM test WHERE id=1 LIMIT 1; |
> 23:48:46,498 | 127.0.0.1 | 74
> Preparing statement |
> 23:48:46,499 | 127.0.0.1 | 253
> Executing single-partition query on test |
> 23:48:46,499 | 127.0.0.1 | 930
> Acquiring sstable references |
> 23:48:46,499 | 127.0.0.1 | 943
> Merging memtable tombstones |
> 23:48:46,499 | 127.0.0.1 | 1032
> Key cache hit for sstable 3 |
> 23:48:46,500 | 127.0.0.1 | 1160
> Seeking to partition beginning in data file |
> 23:48:46,500 | 127.0.0.1 | 1173
> Key cache hit for sstable 2 |
> 23:48:46,500 | 127.0.0.1 | 1889
> Seeking to partition beginning in data file |
> 23:48:46,500 | 127.0.0.1 | 1901
> Key cache hit for sstable 1 |
> 23:48:46,501 | 127.0.0.1 | 2373
> Seeking to partition beginning in data file |
> 23:48:46,501 | 127.0.0.1 | 2384
> Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones |
> 23:48:46,501 | 127.0.0.1 | 2768
> Merging data from memtables and 3 sstables |
> 23:48:46,501 | 127.0.0.1 | 2784
> Read 2 live and 0 tombstoned cells |
> 23:48:46,501 | 127.0.0.1 | 2976
> Request complete |
> 23:48:46,501 | 127.0.0.1 | 3551
> {code}
> We can clearly see that C* hits 3 SSTables on disk instead of just one,
> although it has the min/max column meta data to decide which SSTable contains
> the most recent data.
> Funny enough, if we add a clause on the clustering column to the select, this
> time C* optimizes the read path:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1;
> activity |
> timestamp | source | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
> execute_cql3_query |
> 23:52:31,888 | 127.0.0.1 | 0
> Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; |
> 23:52:31,888 | 127.0.0.1 | 60
> Preparing statement |
> 23:52:31,888 | 127.0.0.1 | 277
> Executing single-partition query on test |
> 23:52:31,889 | 127.0.0.1 | 961
> Acquiring sstable references |
> 23:52:31,889 | 127.0.0.1 | 971
> Merging memtable tombstones |
> 23:52:31,889 | 127.0.0.1 | 1020
> Key cache hit for sstable 3 |
> 23:52:31,889 | 127.0.0.1 | 1108
> Seeking to partition beginning in data file |
> 23:52:31,889 | 127.0.0.1 | 1117
> Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones |
> 23:52:31,889 | 127.0.0.1 | 1611
> Merging data from memtables and 1 sstables |
> 23:52:31,890 | 127.0.0.1 | 1624
> Read 1 live and 0 tombstoned cells |
> 23:52:31,890 | 127.0.0.1 | 1700
> Request complete |
> 23:52:31,890 | 127.0.0.1 | 2140
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)