Re: Thoughts on a possible query language

Sandeep Tata Mon, 22 Jun 2009 11:42:55 -0700

There is some (unfinished) code in the current repo on CQL a SQL-like
Cassandra Query Language that is super simple and (AFAIK) limited to single
node queries.


I suspect there are bigger questions to tackle before we get to query
lanuages in the sense we're talking about--
1. Data model -- Cassandra's values are byte arrays. Any proposal for a
language needs to figure out precisely what data model you're planning to
support. (your examples include numbers, dates, strings)
2. Secondary indexes
3. Query runtime (queries that run on a single node, multiple nodes, query
optimizer?)

I've never understood the value of a parallel-programming abstraction
(map-reduce) for a single node database(CouchDB) ... and I certainly don't
think we're ready to build a map-reduce view engine *in* Cassandra right
now.

IMHO,  there are a bunch of interesting issues we will need to solve before
we can seriously talk about a query language.


On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo <[email protected]> wrote:

> Has anyone given thought to how an SQL-like query language could be
> integrated into Cassandra?
>
> I'm thinking of something which would let you evaluate a limited set
> of relational select operators. For example:
>
>  * first_name = 'Bob'
>  * age > 32
>  * created_at between '2009-08' and '2009-09'
>  * employer_id in (34543, 13177, 9338)
>
> First, is such functionality desired within the framework of
> Cassandra, or do people prefer to keep this functionality in a
> completely separate server component? There are pros and cons to keep
> queries inside Cassandra. I could enumerate them, but I would like to
> hear other people's thoughts first.
>
> An alternative to a text-based query syntax would be to borrow
> CouchDB's concept of views [1]. In CouchDB, views are pre-defined
> indexes which are populated by filtering data through a pair of
> map/reduce functions, which are usually written in JavaScript. Views
> are somewhat limited in expressiveness and flexibility, and do not
> address all possible use cases, but they are very efficient to compute
> and store, and are a fairly elegant system.
>
> Some challenges come to mind:
>
> Cassandra's distributed nature means that a node's queryable indexes
> can/should only reference data in that same node's partition, and that
> a query might have to be executed on multiple nodes. For performance,
> the query processing needs to be parallelized and pipelined.
>
> Could a query planner/optimizer be able to reduce the number of nodes
> required to satisfy a query by looking at the distribution of node
> values across nodes? For example, if the column "first_name" value
> "Foo" only occurs on node A, there's no need to involve node B. But
> such knowledge requires the maintenance of statistics on each node
> that cover all known peers, and the statistics must be kept up to date
> to avoid glaring consistency issues.
>
> Given the nature of Cassandra's column families it's not immediately
> obvious to me how to best address columns in such a language.
>
> [1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>
> A.
>

Re: Thoughts on a possible query language

Reply via email to