Just make virtual table implementations decide? Add a method to VirtualTable interface to indicate if this is desirable, and call it a day?
> On 6 Feb 2023, at 09:41, Benjamin Lerer <b.le...@gmail.com> wrote: > > Making ALLOW FILTERING a table option implies giving the right to the person > creating the table the ability to change the way the server will behave for > that table which might not be something that every C* operator wants. Of > course we can allow operators to controle that through the ALLOW FILTERING > guardrail. At that point we would also need to have a default setting for the > entire database. > > Le ven. 3 févr. 2023 à 23:44, Miklosovic, Stefan > <stefan.mikloso...@netapp.com <mailto:stefan.mikloso...@netapp.com>> a écrit : >> This is the draft for FILTERING ON|OFF in shell. >> >> I would say this is the most simple solution. >> >> We may still consider table option but what do you think about having it >> simply just set via shell? >> >> https://github.com/apache/cassandra/pull/2141/files >> >> ________________________________________ >> From: Josh McKenzie <jmcken...@apache.org <mailto:jmcken...@apache.org>> >> Sent: Friday, February 3, 2023 23:39 >> To: dev >> Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables >> >> NetApp Security WARNING: This is an external email. Do not click links or >> open attachments unless you recognize the sender and know the content is >> safe. >> >> >> >> they would start to set ALLOW FILTERING here and there in order to not think >> twice about their data model so they can just call it a day. >> Setting this on a per-table basis or having users set this on specific >> queries that hit tables and forgetting they set it are 6 of one and >> half-a-dozen of another. >> >> I like the table property idea personally. That communicates an intent about >> the data model and expectation of the size and usage of data in the modeling >> of the schema that embeds some context and intent there's currently no >> mechanism to communicate. >> >> On Fri, Feb 3, 2023, at 5:00 PM, Miklosovic, Stefan wrote: >> Yes, there would be discrepancy. I do not like that either. If it was only >> about "normal tables vs virtual tables", I could live with that. But the >> fact that there are going to be differences among vtables themselves, that >> starts to be a little bit messy. Then we would need to let operators know >> what tables are always allowed to be filtered on and which do not and that >> just complicates it. Putting that information to comment so it is visible in >> DECSCRIBE is nice idea. >> >> That flag we talk about ... that flag would be used purely internally, it >> would not be in schema to be gossiped. >> >> Also, I am starting to like the suggestion to have something like ALLOW >> FILTERING ON in CQLSH so it would be turned on whole CQL session. That >> leaves tables as they are and it should not be a big deal for operators to >> set. We would have to make sure to add "ALLOW FILTERING" clause to every >> SELECT statement (to virtual tables only?) a user submits. I am not sure if >> this is doable yet though. >> >> ________________________________________ >> From: David Capwell <dcapw...@apple.com >> <mailto:dcapw...@apple.com><mailto:dcapw...@apple.com >> <mailto:dcapw...@apple.com>>> >> Sent: Friday, February 3, 2023 22:42 >> To: dev >> Cc: Maxim Muzafarov >> Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables >> >> NetApp Security WARNING: This is an external email. Do not click links or >> open attachments unless you recognize the sender and know the content is >> safe. >> >> >> >> I don't think the assumption that "virtual tables will always be small and >> always fit in memory" is a safe one. >> >> Agree, there is a repair ticket to have the coordinating node do network >> queries to peers to resolve the table (rather than operator querying >> everything, allow the coordinator node to do it for you)… so this assumption >> may not be true down the line. >> >> I could be open to a table property that says ALLOW FILTERING on by default >> or not… then we can pick and choose vtables (or have vtables opt-out)…. I >> kinda like like the lack of consistency with this approach though >> >> On Feb 3, 2023, at 11:24 AM, C. Scott Andreas <sc...@paradoxica.net >> <mailto:sc...@paradoxica.net><mailto:sc...@paradoxica.net >> <mailto:sc...@paradoxica.net>>> wrote: >> >> There are some ideas that development community members have kicked around >> that may falsify the assumption that "virtual tables are tiny and will fit >> in memory." >> >> One example is CASSANDRA-14629: Abstract Virtual Table for very large result >> sets >> https://issues.apache.org/jira/browse/CASSANDRA-14629 >> >> Chris's proposal here is to enable query results from virtual tables to be >> streamed to the client rather than being fully materialized. There are some >> neat possibilities suggested in this ticket, such as debug functionality to >> dump the contents of a raw SSTable via the CQL interface, or the contents of >> the database's internal caches. One could also imagine a feature like this >> providing functionality similar to a foreign data wrapper in other databases. >> >> I don't think the assumption that "virtual tables will always be small and >> always fit in memory" is a safe one. >> >> I don't think we should implicitly add "ALLOW FILTERING" to all queries >> against virtual tables because of this, in addition to concern with >> departing from standard CQL semantics for a type of tables deemed special. >> >> – Scott >> >> On Feb 3, 2023, at 6:52 AM, Maxim Muzafarov <mmu...@apache.org >> <mailto:mmu...@apache.org><mailto:mmu...@apache.org >> <mailto:mmu...@apache.org>>> wrote: >> >> >> Hello Stefan, >> >> Regarding the decision to implicitly enable ALLOW FILTERING for >> virtual tables, which also makes sense to me, it may be necessary to >> consider changing the clustering columns in the virtual table metadata >> to regular columns as well. The reasons are the same as mentioned >> earlier: the virtual tables hold their data in memory, thus we do not >> benefit from the advantages of ordered data (e.g. the ClientsTable and >> its ClusteringColumn(PORT)). >> >> Changing the clustering column to a regular column may simplify the >> virtual table data model, but I'm afraid it may affect users who rely >> on the table metadata. >> >> >> >> On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña <adelap...@apache.org >> <mailto:adelap...@apache.org><mailto:adelap...@apache.org >> <mailto:adelap...@apache.org>>> wrote: >> >> I think removing the need for ALLOW FILTERING on virtual tables makes sense >> and would be quite useful for operators. >> >> That guard exists for performance issues that shouldn't occur on virtual >> tables. We also have a flag in case some future virtual table implementation >> has limitations regarding filtering, although it seems it's not the case >> with any of the existing virtual tables. >> >> It is not like we would promote bad habits because virtual tables are meant >> to be queried by operators / administrators only. >> >> >> It might even be quite the opposite, since in the current situation users >> might get used to routinely use ALLOW FILTERING for querying their virtual >> tables. >> >> It has been mentioned on the #cassandra-dev Slack thread where this started >> (1) that it's kind of an API inconsistency to allow querying by non-primary >> keys on virtual tables without ALLOW FILTERING, whereas it's required for >> regular tables. I think that a simply doc update saying that virtual tables, >> which are not regular tables, support filtering would be enough. Virtual >> tables are well identified by both the keyspace they belong to and doc, so >> users shouldn't have trouble knowing whether a table is virtual. It would be >> similar to the current exception for ALLOW FILTERING, where one needs to use >> it unless the table has an index for the queried column. >> >> (1) https://the-asf.slack.com/archives/CK23JSY2K/p1675352759267329 >> >> On Fri, 3 Feb 2023 at 09:09, Miklosovic, Stefan >> <stefan.mikloso...@netapp.com >> <mailto:stefan.mikloso...@netapp.com><mailto:stefan.mikloso...@netapp.com >> <mailto:stefan.mikloso...@netapp.com>>> wrote: >> >> Hi list, >> >> the content of virtual tables is held in memory (and / or is fetched every >> time upon request). While doing queries against such table for a column >> outside of primary key, normally, users are required to specify ALLOW >> FILTERING. This makes total sense for "ordinary tables" for applications to >> have performant and effective queries but it kinds of loses the >> applicability for virtual tables when it literally holds just handful of >> entries in memory and it just does not matter, does it? >> >> What do you think about implicitly allowing filtering for virtual tables so >> we save ourselves from these pesky errors when we want to query arbitrary >> column and we need to satisfy CQL spec just to do that? >> >> It is not like we would promote bad habits because virtual tables are meant >> to be queried by operators / administrators only. >> >> We can also explicitly document this behavior. >> >> Among other options, we may try to implement secondary indices on virtual >> tables but I am not completely sure this is what we want because its >> complexity etc. Is it even necessary to put such complex logic in place just >> to be able to select any column on few entries in memory? >> >> I put together a draft here (1). It would be ever possible to implicitly >> allow filtering on virtual tables only and it would be implementator's >> responsibility to decide that, per table. >> >> For all virtual tables we currently have, I would enable this everywhere. I >> do not think there is any virtual table where we would not want to enable it >> or where people HAVE TO specify that. >> >> (1) https://github.com/apache/cassandra/pull/2131 >> >> >> >> >>