hey.
any chance of using hypertable's or hbase's query language as a base?

http://code.google.com/p/hypertable/wiki/HQLTutorial
http://wiki.apache.org/hadoop/Hbase/HbaseShell.

both of these are column-oriented DB's which would have similar semantics to ours.

I want to avoid yet another query language which is specific to a tool from creeping up if possible.

saying that. I don't have the time to code it, so take it a wish, and I will be happy with anything that makes cassandra easier to use.

On 23/06/2009, at 4:42 AM, Sandeep Tata wrote:

There is some (unfinished) code in the current repo on CQL a SQL-like
Cassandra Query Language that is super simple and (AFAIK) limited to single
node queries.

I suspect there are bigger questions to tackle before we get to query
lanuages in the sense we're talking about--
1. Data model -- Cassandra's values are byte arrays. Any proposal for a language needs to figure out precisely what data model you're planning to
support. (your examples include numbers, dates, strings)
2. Secondary indexes
3. Query runtime (queries that run on a single node, multiple nodes, query
optimizer?)

I've never understood the value of a parallel-programming abstraction
(map-reduce) for a single node database(CouchDB) ... and I certainly don't think we're ready to build a map-reduce view engine *in* Cassandra right
now.

IMHO, there are a bunch of interesting issues we will need to solve before
we can seriously talk about a query language.


On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo <[email protected]> wrote:

Has anyone given thought to how an SQL-like query language could be
integrated into Cassandra?

I'm thinking of something which would let you evaluate a limited set
of relational select operators. For example:

* first_name = 'Bob'
* age > 32
* created_at between '2009-08' and '2009-09'
* employer_id in (34543, 13177, 9338)

First, is such functionality desired within the framework of
Cassandra, or do people prefer to keep this functionality in a
completely separate server component? There are pros and cons to keep
queries inside Cassandra. I could enumerate them, but I would like to
hear other people's thoughts first.

An alternative to a text-based query syntax would be to borrow
CouchDB's concept of views [1]. In CouchDB, views are pre-defined
indexes which are populated by filtering data through a pair of
map/reduce functions, which are usually written in JavaScript. Views
are somewhat limited in expressiveness and flexibility, and do not
address all possible use cases, but they are very efficient to compute
and store, and are a fairly elegant system.

Some challenges come to mind:

Cassandra's distributed nature means that a node's queryable indexes
can/should only reference data in that same node's partition, and that
a query might have to be executed on multiple nodes. For performance,
the query processing needs to be parallelized and pipelined.

Could a query planner/optimizer be able to reduce the number of nodes
required to satisfy a query by looking at the distribution of node
values across nodes? For example, if the column "first_name" value
"Foo" only occurs on node A, there's no need to involve node B. But
such knowledge requires the maintenance of statistics on each node
that cover all known peers, and the statistics must be kept up to date
to avoid glaring consistency issues.

Given the nature of Cassandra's column families it's not immediately
obvious to me how to best address columns in such a language.

[1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

A.


--
Ian Holsman
[email protected]



Reply via email to