This may sound a bit harsh, but I teach my developers that if they are trying
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for
its high availability and scalability characteristics. We love no downtime.
ALLOW FILTERING is breaking the rules of availability and
Hi Shalom,
Thanks for your notes! So you also experienced this thing... fine
Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror
the data (somehow) OR add
Hi Attila,
I'm definitely no guru, but I've experienced several cases where people at
my company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large
partitions, performance will decrease.
GC can be affected. And if GC
Hi Gurus,
Looks we stopped this thread. However I would be very much curious
answers regarding b) ...
Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as
we are planning to run analysis queries by hand exactly like that over
the cluster...
Hi again,
so remaining with a) for a second...
"Why am I using ALLOW FILTERING in the first place?"
Fully agreed! To put it this way: as I reviewer I never want to see
string occurence "allow filtering" in any selects done by a production
code. I clearly consider it as an indicator of a wrong
a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)
I think you should ask yourself a different question. Why am I using ALLOW
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to
Hi,
"When you run a query with allow filtering, Cassandra doesn't know where
the data is located, so it has to go node by node, searching for the
requested data."
a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)
b) Still does
Hi Vsevolod,
1) Why such behavior? I thought any given SELECT request is handled by a
limited subset of C* nodes and not by all of them, as per connection
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know where the
data is located,
Hello everyone,
We have an 8 node C* cluster with large volume of unbalanced data. Usual
per-partition selects work somewhat fine, and are processed by limited
number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
such command stalls all 8 nodes to halt and unresponsiveness to