[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826335#comment-13826335 ]
Sylvain Lebresne commented on CASSANDRA-6348: --------------------------------------------- bq. Other than hadoop queries, It's common for user to query on multiple indexes I sure hope you're wrong and for sure it shoudn't be, because Cassandra sucks at it. And I personally have almost never seen anyone use it (on the mailing list for instance). ALLOW FILTERING is really meant as a "don't do unless you're just having fun with cqlsh on a toy database". Using ALLOW FILTERING on real production queries is wrong (at least for CQL queries, I'm not talking about Hadoop, which is a different problem). I'm more than happy to make the document/message more clear about that fact if it's not. bq. Hadoop Cql query uses "ALLOW FILTERING" Which is kind of a problem in the sense that it's not what ALLOW FILTERING has been intended for and that more generally CQL has never been designed with Hadoop in mind, it's a strictly real-time oriented language. So maybe we should re-purpose ALLOW FILTERING as "the hadoop mode" somehow, but if we do, we should be a explicit about it and think about how to do that best. But trying to shove Hadoop into something it hasn't been made for feels wrong to me. That being said, I wonder if an overall simpler solution to the "Hadoop wants to use the 2dnary indexes" problem couldn't be better solves by letting it query the 2ndary index CFS directly. That is, allow selects on the index itself (which would obviously require a special flag to unlock). That way, Hadoop would get paging over the index "for free" (which at the end of the day is the problem that needs solving if I understand it correctly) and would get control over that paging. And it would allow Hadoop to do things like merging indexes that probably make more sense on the Hadoop side that it makes on the realtime side (i.e. we keep Cassandra focuses on on realtime queries with as little processing as possible, which is what it is good at). > TimeoutException throws if Cql query allows data filtering and index is too > big and it can't find the data in base CF after filtering > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-6348 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Alex Liu > Assignee: Alex Liu > > If index row is too big, and filtering can't find the match Cql row in base > CF, it keep scanning the index row and retrieving base CF until the index row > is scanned completely which may take too long and thrift server returns > TimeoutException. This is one of the reasons why we shouldn't index a column > if the index is too big. > Multiple indexes merging can resolve the case where there are only EQUAL > clauses. (CASSANDRA-6048 addresses it). > If the query has none-EQUAL clauses, we still need do data filtering which > might lead to timeout exception. > We can either disable those kind of queries or WARN the user that data > filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)