[ https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-15803: ------------------------------------- Description: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). was: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition. So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. One way would be to have it be {{ALLOW FILTERING [WITHIN PARTITION]}} This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). > Separate out allow filtering scanning through a partition versus scanning > over the table > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-15803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax > Reporter: Jeremy Hanna > Priority: Normal > > Currently allow filtering can mean two things in the spirit of "avoid > operations that don't seek to a specific row or sequential rows of data." > First, it can mean scanning across the entire table to meet the criteria of > the query. That's almost always a bad thing and should be discouraged or > disabled (see CASSANDRA-8303). Second, it can mean filtering within a > specific partition. For example, in a query you could specify the full > partition key and if you specify a criterion on a non-key field, it requires > allow filtering. > The second reason to require allow filtering is significantly less work to > scan through a partition. It is still extra work over seeking to a specific > row and getting N sequential rows though. So while an application developer > and/or operator needs to be cautious about this second type, it's not > necessarily a bad thing, depending on the table and the use case. > I propose that we separate the way to specify allow filtering across an > entire table (involving a scatter gather) from specifying allow filtering > across a partition in a backwards compatible way. One idea that was brought > up in Slack in the cassandra-dev room was to have allow filtering mean the > superset - scanning across the table. Then if you want to specify that you > *only* want to scan within a partition you would use something like > {{ALLOW FILTERING [WITHIN PARTITION]}} > So it will succeed if you specify non-key criteria within a single partition, > but fail with a message to say it requires the full allow filtering. > > This would allow for a backwards compatible full allow filtering while > allowing a user to specify that they want to just scan within a partition, > but error out if trying to scan a full table. > This is potentially also related to the capability limitation framework by > which operators could more granularly specify what features are allowed or > disallowed per user, discussed in CASSANDRA-8303. This way an operator could > disallow the more general allow filtering while allowing the partition scan > (or disallow them both at their discretion). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org