[jira] [Comment Edited] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

Alex Petrov (JIRA) Thu, 25 Aug 2016 02:18:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436560#comment-15436560
 ]


Alex Petrov edited comment on CASSANDRA-11031 at 8/25/16 9:17 AM:
------------------------------------------------------------------

Thank you for the review! I've addressed your comments and wait for the test 
results now. All changes are done in a separate commit to make it simpler to 
see what's been done, I'll combine them if you +1.

bq. The hasSlice method is now duplicated in PartitionKeySingleRestrictionSet 
and ClusteringColumnRestrictions. It would be better to put it only into 
RestrictionSet and expose it into Restrictions.

Removed the redundant method, pulled it up.

bq. We should have some unit test for MaterializedViews with Slice and 
unrestricted partition keys.

Added the MV tests, similar to the previously existing ones.

bq. In StatementRestrictions::processPartitionKeyRestrictions I do not think 
that the error message: "Partition key parts: %s must be restricted as other 
parts are" makes sense any more (for non-MVs). We should probably only use the 
ALLOW FILTERING error message.

Now we just have one error message, same everywhere. Tests adjusted accordingly.

bq. The part of StatementRestrictions::processPartitionKeyRestrictions dealing 
with secondary index can probably be combined with the one dealing with the 
filtering.

True, I've separated them for sakes of simplicity, but I hope that new version 
looks clean as well. Combining them furhter is tricky and will require 
double-checking of non-restricted PK parts and whether or not restrictions are 
empty. It was a bit simpler previously since we did't have to take care of 
slices.

bq. In RowFilter we should not use Lambda expression to filter out the rows as 
they are pretty bad from the performance point of view and result in a lot of 
garbage. In RowFilter: Iterables.size(rowLevelExpressions) can be replaced by 
rowLevelExpressions.size()

Optimised that part as well, got rid of streams alltogether, we iterate through 
restrictions just once now.

bq. The tests using secondary indices should be put into SecondaryIndexTest

Moved the tests.

bq. It will be good to add more tests into 
SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken: with > on 
one of the partition key or only the second partition key being specified

I've added one more clause that seemed missing, although since token 
restrictions require both parts for the token function, I did not add anything 
else.

bq. In SelectTest::testAllowFilteringOnPartitionKey the message IN restrictions 
are not supported on indexed columns looks wrong to me as the test do not use 
any secondary index.

For clarity here, I've added a more elaborate message that specifies that 
filtering is involved.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]|


was (Author: ifesdjeen):
Thank you for the review! I've addressed your comments and wait for the test 
results now. All changes are done in a separate commit to make it simpler to 
see what's been done, I'll combine them if you +1.

bq. The hasSlice method is now duplicated in PartitionKeySingleRestrictionSet 
and ClusteringColumnRestrictions. It would be better to put it only into 
RestrictionSet and expose it into Restrictions.

Removed the redundant method, pulled it up.

bw. We should have some unit test for MaterializedViews with Slice and 
unrestricted partition keys.

Added the MV tests, similar to the previously existing ones.

bq. In StatementRestrictions::processPartitionKeyRestrictions I do not think 
that the error message: "Partition key parts: %s must be restricted as other 
parts are" makes sense any more (for non-MVs). We should probably only use the 
ALLOW FILTERING error message.

Now we just have one error message, same everywhere. Tests adjusted accordingly.

bq. The part of StatementRestrictions::processPartitionKeyRestrictions dealing 
with secondary index can probably be combined with the one dealing with the 
filtering.

True, I've separated them for sakes of simplicity, but I hope that new version 
looks clean as well. Combining them furhter is tricky and will require 
double-checking of non-restricted PK parts and whether or not restrictions are 
empty. It was a bit simpler previously since we did't have to take care of 
slices.

bq. In RowFilter we should not use Lambda expression to filter out the rows as 
they are pretty bad from the performance point of view and result in a lot of 
garbage. In RowFilter: Iterables.size(rowLevelExpressions) can be replaced by 
rowLevelExpressions.size()

Optimised that part as well, got rid of streams alltogether, we iterate through 
restrictions just once now.

bq. The tests using secondary indices should be put into SecondaryIndexTest

Moved the tests.

bq. It will be good to add more tests into 
SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken: with > on 
one of the partition key or only the second partition key being specified

I've added one more clause that seemed missing, although since token 
restrictions require both parts for the token function, I did not add anything 
else.

bq. In SelectTest::testAllowFilteringOnPartitionKey the message IN restrictions 
are not supported on indexed columns looks wrong to me as the test do not use 
any secondary index.

For clarity here, I've added a more elaborate message that specifies that 
filtering is involved.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]|

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-11031
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: ZhaoYang
>            Assignee: ZhaoYang
>            Priority: Minor
>             Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>       tenant_id text,
>       pk2 text,
>       c1 text,
>       c2 text,
>       v1 text,
>       v2 text,
>       PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

Reply via email to