[ 
https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-15803:
-------------------------------------
    Description: 
Currently allow filtering can mean two things in the spirit of "avoid 
operations that don't seek to a specific row or sequential rows of data."  
First, it can mean scanning across the entire table to meet the criteria of the 
query.  That's almost always a bad thing and should be discouraged or disabled 
(see CASSANDRA-8303).  Second, it can mean filtering within a specific 
partition.  For example, in a query you could specify the full partition key 
and if you specify a criterion on a non-key field, it requires allow filtering.

The second reason to require allow filtering is significantly less work to scan 
through a partition.  It is still extra work over seeking to a specific row and 
getting N sequential rows though.  So while an application developer and/or 
operator needs to be cautious about this second type, it's not necessarily a 
bad thing, depending on the table and the use case.

I propose that we separate the way to specify allow filtering across an entire 
table (involving a scatter gather) from specifying allow filtering across a 
partition in a backwards compatible way.  One idea that was brought up in Slack 
in the cassandra-dev room was to have allow filtering mean the superset - 
scanning across the table.  Then if you want to specify that you *only* want to 
scan within a partition you would use something like

{{ALLOW FILTERING [WITHIN PARTITION]}}

So it will succeed if you specify non-key criteria within a single partition, 
but fail with a message to say it requires the full allow filtering.

 

This would allow for a backwards compatible full allow filtering while allowing 
a user to specify that they want to just scan within a partition, but error out 
if trying to scan a full table.

This is potentially also related to the capability limitation framework by 
which operators could more granularly specify what features are allowed or 
disallowed per user, discussed in CASSANDRA-8303.  This way an operator could 
disallow the more general allow filtering while allowing the partition scan (or 
disallow them both at their discretion).

  was:
Currently allow filtering can mean two things in the spirit of "avoid 
operations that don't seek to a specific row or sequential rows of data."  
First, it can mean scanning across the entire table to meet the criteria of the 
query.  That's almost always a bad thing and should be discouraged or disabled 
(see CASSANDRA-8303).  Second, it can mean filtering within a specific 
partition.  For example, in a query you could specify the full partition key 
and if you specify a criterion on a non-key field, it requires allow filtering.

The second reason to require allow filtering is significantly less work to scan 
through a partition.  It is still extra work over seeking to a specific row and 
getting N sequential rows though.  So while an application developer and/or 
operator needs to be cautious about this second type, it's not necessarily a 
bad thing, depending on the table and the use case.

I propose that we separate the way to specify allow filtering across an entire 
table (involving a scatter gather) from specifying allow filtering across a 
partition in a backwards compatible way.  One idea that was brought up in Slack 
in the cassandra-dev room was to have allow filtering mean the superset - 
scanning across the table.  Then if you want to specify that you *only* want to 
scan within a partition.  So it will succeed if you specify non-key criteria 
within a single partition, but fail with a message to say it requires the full 
allow filtering.  One way would be to have it be 

{{ALLOW FILTERING [WITHIN PARTITION]}}

This would allow for a backwards compatible full allow filtering while allowing 
a user to specify that they want to just scan within a partition, but error out 
if trying to scan a full table.

This is potentially also related to the capability limitation framework by 
which operators could more granularly specify what features are allowed or 
disallowed per user, discussed in CASSANDRA-8303.  This way an operator could 
disallow the more general allow filtering while allowing the partition scan (or 
disallow them both at their discretion).


> Separate out allow filtering scanning through a partition versus scanning 
> over the table
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15803
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15803
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Syntax
>            Reporter: Jeremy Hanna
>            Priority: Normal
>
> Currently allow filtering can mean two things in the spirit of "avoid 
> operations that don't seek to a specific row or sequential rows of data."  
> First, it can mean scanning across the entire table to meet the criteria of 
> the query.  That's almost always a bad thing and should be discouraged or 
> disabled (see CASSANDRA-8303).  Second, it can mean filtering within a 
> specific partition.  For example, in a query you could specify the full 
> partition key and if you specify a criterion on a non-key field, it requires 
> allow filtering.
> The second reason to require allow filtering is significantly less work to 
> scan through a partition.  It is still extra work over seeking to a specific 
> row and getting N sequential rows though.  So while an application developer 
> and/or operator needs to be cautious about this second type, it's not 
> necessarily a bad thing, depending on the table and the use case.
> I propose that we separate the way to specify allow filtering across an 
> entire table (involving a scatter gather) from specifying allow filtering 
> across a partition in a backwards compatible way.  One idea that was brought 
> up in Slack in the cassandra-dev room was to have allow filtering mean the 
> superset - scanning across the table.  Then if you want to specify that you 
> *only* want to scan within a partition you would use something like
> {{ALLOW FILTERING [WITHIN PARTITION]}}
> So it will succeed if you specify non-key criteria within a single partition, 
> but fail with a message to say it requires the full allow filtering.
>  
> This would allow for a backwards compatible full allow filtering while 
> allowing a user to specify that they want to just scan within a partition, 
> but error out if trying to scan a full table.
> This is potentially also related to the capability limitation framework by 
> which operators could more granularly specify what features are allowed or 
> disallowed per user, discussed in CASSANDRA-8303.  This way an operator could 
> disallow the more general allow filtering while allowing the partition scan 
> (or disallow them both at their discretion).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to