This is the draft for FILTERING ON|OFF in shell.

I would say this is the most simple solution.

We may still consider table option but what do you think about having it simply 
just set via shell?

https://github.com/apache/cassandra/pull/2141/files

________________________________________
From: Josh McKenzie <jmcken...@apache.org>
Sent: Friday, February 3, 2023 23:39
To: dev
Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



they would start to set ALLOW FILTERING here and there in order to not think 
twice about their data model so they can just call it a day.
Setting this on a per-table basis or having users set this on specific queries 
that hit tables and forgetting they set it are 6 of one and half-a-dozen of 
another.

I like the table property idea personally. That communicates an intent about 
the data model and expectation of the size and usage of data in the modeling of 
the schema that embeds some context and intent there's currently no mechanism 
to communicate.

On Fri, Feb 3, 2023, at 5:00 PM, Miklosovic, Stefan wrote:
Yes, there would be discrepancy. I do not like that either. If it was only 
about "normal tables vs virtual tables", I could live with that. But the fact 
that there are going to be differences among vtables themselves, that starts to 
be a little bit messy. Then we would need to let operators know what tables are 
always allowed to be filtered on and which do not and that just complicates it. 
Putting that information to comment so it is visible in DECSCRIBE is nice idea.

That flag we talk about ... that flag would be used purely internally, it would 
not be in schema to be gossiped.

Also, I am starting to like the suggestion to have something like ALLOW 
FILTERING ON in CQLSH so it would be turned on whole CQL session. That leaves 
tables as they are and it should not be a big deal for operators to set. We 
would have to make sure to add "ALLOW FILTERING" clause to every SELECT 
statement (to virtual tables only?) a user submits. I am not sure if this is 
doable yet though.

________________________________________
From: David Capwell <dcapw...@apple.com<mailto:dcapw...@apple.com>>
Sent: Friday, February 3, 2023 22:42
To: dev
Cc: Maxim Muzafarov
Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I don't think the assumption that "virtual tables will always be small and 
always fit in memory" is a safe one.

Agree, there is a repair ticket to have the coordinating node do network 
queries to peers to resolve the table (rather than operator querying 
everything, allow the coordinator node to do it for you)… so this assumption 
may not be true down the line.

I could be open to a table property that says ALLOW FILTERING on by default or 
not… then we can pick and choose vtables (or have vtables opt-out)…. I kinda 
like like the lack of consistency with this approach though

On Feb 3, 2023, at 11:24 AM, C. Scott Andreas 
<sc...@paradoxica.net<mailto:sc...@paradoxica.net>> wrote:

There are some ideas that development community members have kicked around that 
may falsify the assumption that "virtual tables are tiny and will fit in 
memory."

One example is CASSANDRA-14629: Abstract Virtual Table for very large result 
sets
https://issues.apache.org/jira/browse/CASSANDRA-14629

Chris's proposal here is to enable query results from virtual tables to be 
streamed to the client rather than being fully materialized. There are some 
neat possibilities suggested in this ticket, such as debug functionality to 
dump the contents of a raw SSTable via the CQL interface, or the contents of 
the database's internal caches. One could also imagine a feature like this 
providing functionality similar to a foreign data wrapper in other databases.

I don't think the assumption that "virtual tables will always be small and 
always fit in memory" is a safe one.

I don't think we should implicitly add "ALLOW FILTERING" to all queries against 
virtual tables because of this, in addition to concern with departing from 
standard CQL semantics for a type of tables deemed special.

– Scott

On Feb 3, 2023, at 6:52 AM, Maxim Muzafarov 
<mmu...@apache.org<mailto:mmu...@apache.org>> wrote:


Hello Stefan,

Regarding the decision to implicitly enable ALLOW FILTERING for
virtual tables, which also makes sense to me, it may be necessary to
consider changing the clustering columns in the virtual table metadata
to regular columns as well. The reasons are the same as mentioned
earlier: the virtual tables hold their data in memory, thus we do not
benefit from the advantages of ordered data (e.g. the ClientsTable and
its ClusteringColumn(PORT)).

Changing the clustering column to a regular column may simplify the
virtual table data model, but I'm afraid it may affect users who rely
on the table metadata.



On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña 
<adelap...@apache.org<mailto:adelap...@apache.org>> wrote:

I think removing the need for ALLOW FILTERING on virtual tables makes sense and 
would be quite useful for operators.

That guard exists for performance issues that shouldn't occur on virtual 
tables. We also have a flag in case some future virtual table implementation 
has limitations regarding filtering, although it seems it's not the case with 
any of the existing virtual tables.

It is not like we would promote bad habits because virtual tables are meant to 
be queried by operators / administrators only.


It might even be quite the opposite, since in the current situation users might 
get used to routinely use ALLOW FILTERING for querying their virtual tables.

It has been mentioned on the #cassandra-dev Slack thread where this started (1) 
that it's kind of an API inconsistency to allow querying by non-primary keys on 
virtual tables without ALLOW FILTERING, whereas it's required for regular 
tables. I think that a simply doc update saying that virtual tables, which are 
not regular tables, support filtering would be enough. Virtual tables are well 
identified by both the keyspace they belong to and doc, so users shouldn't have 
trouble knowing whether a table is virtual. It would be similar to the current 
exception for ALLOW FILTERING, where one needs to use it unless the table has 
an index for the queried column.

(1) https://the-asf.slack.com/archives/CK23JSY2K/p1675352759267329

On Fri, 3 Feb 2023 at 09:09, Miklosovic, Stefan 
<stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com>> wrote:

Hi list,

the content of virtual tables is held in memory (and / or is fetched every time 
upon request). While doing queries against such table for a column outside of 
primary key, normally, users are required to specify ALLOW FILTERING. This 
makes total sense for "ordinary tables" for applications to have performant and 
effective queries but it kinds of loses the applicability for virtual tables 
when it literally holds just handful of entries in memory and it just does not 
matter, does it?

What do you think about implicitly allowing filtering for virtual tables so we 
save ourselves from these pesky errors when we want to query arbitrary column 
and we need to satisfy CQL spec just to do that?

It is not like we would promote bad habits because virtual tables are meant to 
be queried by operators / administrators only.

We can also explicitly document this behavior.

Among other options, we may try to implement secondary indices on virtual 
tables but I am not completely sure this is what we want because its complexity 
etc. Is it even necessary to put such complex logic in place just to be able to 
select any column on few entries in memory?

I put together a draft here (1). It would be ever possible to implicitly allow 
filtering on virtual tables only and it would be implementator's responsibility 
to decide that, per table.

For all virtual tables we currently have, I would enable this everywhere. I do 
not think there is any virtual table where we would not want to enable it or 
where people HAVE TO specify that.

(1) https://github.com/apache/cassandra/pull/2131





Reply via email to