[
https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717146#comment-17717146
]
Andres de la Peña commented on CASSANDRA-15803:
-----------------------------------------------
I think currently AF is required for any query that returns less rows than
those that are retrieved from the storage engine. Under that logic, the
aforementioned example is correct in requiring AF:
{code:java}
create table ks.tb (id int, cl1 int, cl2 int, col1 int, primary key ((id), cl1,
cl2))
select * from ks2.tb where id = 1 and cl1 = 2 and cl2 = 3 and col1 = 4; //
returns less rows than reads, so it filters
{code}
Changing that would mean altering the semantic of AF.
One might argue that the current semantic of AF is useless in some cases, like
the one mentioned above. However I tend to think that the current semantic is
clear and easy to understand, albeit of limited usefulness in some cases. I
think the question is whether we want to change the current semantic from
"filter anything" to "filter over a potentially large dataset".
It is worth mentioning that AF is not the only way to prevent massive
filtering. We also have [read
thresholds|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1712-L1729]
that would abort queries scanning too much data. That's probably more useful
in practice, and more accurate. However, it lacks the ability of AF to complain
even when preparing a query.
The static analysis of the query done by AF is however imprecise, and it tends
to make AF a quite frustrating requirement. So I'd be happy by just removing AF
and relying on config properties limiting capabilities and read thresholds
aborting queries.
{quote}I would go with a guardrail only if we do not want to make it granular
per table.
{quote}
Per-table guardrails sounds like an interesting idea. Those guardrails could be
shipped as table properties limiting capabilities. Efforts on that front would
probably have more potential for reutilization than extending the CQL grammar
with {{{}WITHIN PARTITION{}}}.
{quote}The question is what do you do with index queries that filter across
multiple rows? Do you consider it as equivalent to a partition scan?
{quote}
That's indeed a tricky question. Normally {{WITHIN PARTITION}} would assume
filtering on a single partition on a single partition. An index query however
would do filtering on a single partition on each node in the cluster. And I
don't think we want to further complicate the grammar with {{ALLOW FILTERING
WITHIN NODE WITHIN PARTITION}}, or something like that.
> Separate out allow filtering scanning through a partition versus scanning
> over the table
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15803
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL/Syntax
> Reporter: Jeremy Hanna
> Assignee: Stefan Miklosovic
> Priority: Normal
>
> Currently allow filtering can mean two things in the spirit of "avoid
> operations that don't seek to a specific row or sequential rows of data."
> First, it can mean scanning across the entire table to meet the criteria of
> the query. That's almost always a bad thing and should be discouraged or
> disabled (see CASSANDRA-8303). Second, it can mean filtering within a
> specific partition. For example, in a query you could specify the full
> partition key and if you specify a criterion on a non-key field, it requires
> allow filtering.
> The second reason to require allow filtering is significantly less work to
> scan through a partition. It is still extra work over seeking to a specific
> row and getting N sequential rows though. So while an application developer
> and/or operator needs to be cautious about this second type, it's not
> necessarily a bad thing, depending on the table and the use case.
> I propose that we separate the way to specify allow filtering across an
> entire table from specifying allow filtering across a partition in a
> backwards compatible way. One idea that was brought up in Slack in the
> cassandra-dev room was to have allow filtering mean the superset - scanning
> across the table. Then if you want to specify that you *only* want to scan
> within a partition you would use something like
> {{ALLOW FILTERING [WITHIN PARTITION]}}
> So it will succeed if you specify non-key criteria within a single partition,
> but fail with a message to say it requires the full allow filtering. This
> would allow for a backwards compatible full allow filtering while allowing a
> user to specify that they want to just scan within a partition, but error out
> if trying to scan a full table.
> This is potentially also related to the capability limitation framework by
> which operators could more granularly specify what features are allowed or
> disallowed per user, discussed in CASSANDRA-8303. This way an operator could
> disallow the more general allow filtering while allowing the partition scan
> (or disallow them both at their discretion).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]