Just make virtual table implementations decide?

Add a method to VirtualTable interface to indicate if this is desirable, and 
call it a day? 

> On 6 Feb 2023, at 09:41, Benjamin Lerer <b.le...@gmail.com> wrote:
> 
> Making ALLOW FILTERING a table option implies giving the right to the person 
> creating the table the ability to change the way the server will behave for 
> that table which might not be something that every C* operator wants. Of 
> course we can allow operators to controle that through the ALLOW FILTERING 
> guardrail. At that point we would also need to have a default setting for the 
> entire database.
> 
> Le ven. 3 févr. 2023 à 23:44, Miklosovic, Stefan 
> <stefan.mikloso...@netapp.com <mailto:stefan.mikloso...@netapp.com>> a écrit :
>> This is the draft for FILTERING ON|OFF in shell.
>> 
>> I would say this is the most simple solution.
>> 
>> We may still consider table option but what do you think about having it 
>> simply just set via shell?
>> 
>> https://github.com/apache/cassandra/pull/2141/files
>> 
>> ________________________________________
>> From: Josh McKenzie <jmcken...@apache.org <mailto:jmcken...@apache.org>>
>> Sent: Friday, February 3, 2023 23:39
>> To: dev
>> Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables
>> 
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>> 
>> 
>> 
>> they would start to set ALLOW FILTERING here and there in order to not think 
>> twice about their data model so they can just call it a day.
>> Setting this on a per-table basis or having users set this on specific 
>> queries that hit tables and forgetting they set it are 6 of one and 
>> half-a-dozen of another.
>> 
>> I like the table property idea personally. That communicates an intent about 
>> the data model and expectation of the size and usage of data in the modeling 
>> of the schema that embeds some context and intent there's currently no 
>> mechanism to communicate.
>> 
>> On Fri, Feb 3, 2023, at 5:00 PM, Miklosovic, Stefan wrote:
>> Yes, there would be discrepancy. I do not like that either. If it was only 
>> about "normal tables vs virtual tables", I could live with that. But the 
>> fact that there are going to be differences among vtables themselves, that 
>> starts to be a little bit messy. Then we would need to let operators know 
>> what tables are always allowed to be filtered on and which do not and that 
>> just complicates it. Putting that information to comment so it is visible in 
>> DECSCRIBE is nice idea.
>> 
>> That flag we talk about ... that flag would be used purely internally, it 
>> would not be in schema to be gossiped.
>> 
>> Also, I am starting to like the suggestion to have something like ALLOW 
>> FILTERING ON in CQLSH so it would be turned on whole CQL session. That 
>> leaves tables as they are and it should not be a big deal for operators to 
>> set. We would have to make sure to add "ALLOW FILTERING" clause to every 
>> SELECT statement (to virtual tables only?) a user submits. I am not sure if 
>> this is doable yet though.
>> 
>> ________________________________________
>> From: David Capwell <dcapw...@apple.com 
>> <mailto:dcapw...@apple.com><mailto:dcapw...@apple.com 
>> <mailto:dcapw...@apple.com>>>
>> Sent: Friday, February 3, 2023 22:42
>> To: dev
>> Cc: Maxim Muzafarov
>> Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables
>> 
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>> 
>> 
>> 
>> I don't think the assumption that "virtual tables will always be small and 
>> always fit in memory" is a safe one.
>> 
>> Agree, there is a repair ticket to have the coordinating node do network 
>> queries to peers to resolve the table (rather than operator querying 
>> everything, allow the coordinator node to do it for you)… so this assumption 
>> may not be true down the line.
>> 
>> I could be open to a table property that says ALLOW FILTERING on by default 
>> or not… then we can pick and choose vtables (or have vtables opt-out)…. I 
>> kinda like like the lack of consistency with this approach though
>> 
>> On Feb 3, 2023, at 11:24 AM, C. Scott Andreas <sc...@paradoxica.net 
>> <mailto:sc...@paradoxica.net><mailto:sc...@paradoxica.net 
>> <mailto:sc...@paradoxica.net>>> wrote:
>> 
>> There are some ideas that development community members have kicked around 
>> that may falsify the assumption that "virtual tables are tiny and will fit 
>> in memory."
>> 
>> One example is CASSANDRA-14629: Abstract Virtual Table for very large result 
>> sets
>> https://issues.apache.org/jira/browse/CASSANDRA-14629
>> 
>> Chris's proposal here is to enable query results from virtual tables to be 
>> streamed to the client rather than being fully materialized. There are some 
>> neat possibilities suggested in this ticket, such as debug functionality to 
>> dump the contents of a raw SSTable via the CQL interface, or the contents of 
>> the database's internal caches. One could also imagine a feature like this 
>> providing functionality similar to a foreign data wrapper in other databases.
>> 
>> I don't think the assumption that "virtual tables will always be small and 
>> always fit in memory" is a safe one.
>> 
>> I don't think we should implicitly add "ALLOW FILTERING" to all queries 
>> against virtual tables because of this, in addition to concern with 
>> departing from standard CQL semantics for a type of tables deemed special.
>> 
>> – Scott
>> 
>> On Feb 3, 2023, at 6:52 AM, Maxim Muzafarov <mmu...@apache.org 
>> <mailto:mmu...@apache.org><mailto:mmu...@apache.org 
>> <mailto:mmu...@apache.org>>> wrote:
>> 
>> 
>> Hello Stefan,
>> 
>> Regarding the decision to implicitly enable ALLOW FILTERING for
>> virtual tables, which also makes sense to me, it may be necessary to
>> consider changing the clustering columns in the virtual table metadata
>> to regular columns as well. The reasons are the same as mentioned
>> earlier: the virtual tables hold their data in memory, thus we do not
>> benefit from the advantages of ordered data (e.g. the ClientsTable and
>> its ClusteringColumn(PORT)).
>> 
>> Changing the clustering column to a regular column may simplify the
>> virtual table data model, but I'm afraid it may affect users who rely
>> on the table metadata.
>> 
>> 
>> 
>> On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña <adelap...@apache.org 
>> <mailto:adelap...@apache.org><mailto:adelap...@apache.org 
>> <mailto:adelap...@apache.org>>> wrote:
>> 
>> I think removing the need for ALLOW FILTERING on virtual tables makes sense 
>> and would be quite useful for operators.
>> 
>> That guard exists for performance issues that shouldn't occur on virtual 
>> tables. We also have a flag in case some future virtual table implementation 
>> has limitations regarding filtering, although it seems it's not the case 
>> with any of the existing virtual tables.
>> 
>> It is not like we would promote bad habits because virtual tables are meant 
>> to be queried by operators / administrators only.
>> 
>> 
>> It might even be quite the opposite, since in the current situation users 
>> might get used to routinely use ALLOW FILTERING for querying their virtual 
>> tables.
>> 
>> It has been mentioned on the #cassandra-dev Slack thread where this started 
>> (1) that it's kind of an API inconsistency to allow querying by non-primary 
>> keys on virtual tables without ALLOW FILTERING, whereas it's required for 
>> regular tables. I think that a simply doc update saying that virtual tables, 
>> which are not regular tables, support filtering would be enough. Virtual 
>> tables are well identified by both the keyspace they belong to and doc, so 
>> users shouldn't have trouble knowing whether a table is virtual. It would be 
>> similar to the current exception for ALLOW FILTERING, where one needs to use 
>> it unless the table has an index for the queried column.
>> 
>> (1) https://the-asf.slack.com/archives/CK23JSY2K/p1675352759267329
>> 
>> On Fri, 3 Feb 2023 at 09:09, Miklosovic, Stefan 
>> <stefan.mikloso...@netapp.com 
>> <mailto:stefan.mikloso...@netapp.com><mailto:stefan.mikloso...@netapp.com 
>> <mailto:stefan.mikloso...@netapp.com>>> wrote:
>> 
>> Hi list,
>> 
>> the content of virtual tables is held in memory (and / or is fetched every 
>> time upon request). While doing queries against such table for a column 
>> outside of primary key, normally, users are required to specify ALLOW 
>> FILTERING. This makes total sense for "ordinary tables" for applications to 
>> have performant and effective queries but it kinds of loses the 
>> applicability for virtual tables when it literally holds just handful of 
>> entries in memory and it just does not matter, does it?
>> 
>> What do you think about implicitly allowing filtering for virtual tables so 
>> we save ourselves from these pesky errors when we want to query arbitrary 
>> column and we need to satisfy CQL spec just to do that?
>> 
>> It is not like we would promote bad habits because virtual tables are meant 
>> to be queried by operators / administrators only.
>> 
>> We can also explicitly document this behavior.
>> 
>> Among other options, we may try to implement secondary indices on virtual 
>> tables but I am not completely sure this is what we want because its 
>> complexity etc. Is it even necessary to put such complex logic in place just 
>> to be able to select any column on few entries in memory?
>> 
>> I put together a draft here (1). It would be ever possible to implicitly 
>> allow filtering on virtual tables only and it would be implementator's 
>> responsibility to decide that, per table.
>> 
>> For all virtual tables we currently have, I would enable this everywhere. I 
>> do not think there is any virtual table where we would not want to enable it 
>> or where people HAVE TO specify that.
>> 
>> (1) https://github.com/apache/cassandra/pull/2131
>> 
>> 
>> 
>> 
>> 

Reply via email to