Re: Thoughts on a possible query language

Ian Holsman Mon, 22 Jun 2009 15:13:59 -0700

hey.
any chance of using hypertable's or hbase's query language as a base?


http://code.google.com/p/hypertable/wiki/HQLTutorial
http://wiki.apache.org/hadoop/Hbase/HbaseShell.

both of these are column-oriented DB's which would have similarsemantics to ours.

I want to avoid yet another query language which is specific to a toolfrom creeping up if possible.

saying that. I don't have the time to code it, so take it a wish, andI will be happy with anything that makes cassandra easier to use.


On 23/06/2009, at 4:42 AM, Sandeep Tata wrote:

There is some (unfinished) code in the current repo on CQL a SQL-like

Cassandra Query Language that is super simple and (AFAIK) limited tosingle

node queries.

I suspect there are bigger questions to tackle before we get to query
lanuages in the sense we're talking about--

1. Data model -- Cassandra's values are byte arrays. Any proposalfor alanguage needs to figure out precisely what data model you'replanning to

support. (your examples include numbers, dates, strings)
2. Secondary indexes

3. Query runtime (queries that run on a single node, multiple nodes,query

optimizer?)

I've never understood the value of a parallel-programming abstraction

(map-reduce) for a single node database(CouchDB) ... and I certainlydon'tthink we're ready to build a map-reduce view engine *in* Cassandraright

now.

IMHO, there are a bunch of interesting issues we will need to solvebefore

we can seriously talk about a query language.

On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo <[email protected]>wrote:

Has anyone given thought to how an SQL-like query language could be
integrated into Cassandra?

I'm thinking of something which would let you evaluate a limited set
of relational select operators. For example:

* first_name = 'Bob'
* age > 32
* created_at between '2009-08' and '2009-09'
* employer_id in (34543, 13177, 9338)

First, is such functionality desired within the framework of
Cassandra, or do people prefer to keep this functionality in a
completely separate server component? There are pros and cons to keep
queries inside Cassandra. I could enumerate them, but I would like to
hear other people's thoughts first.

An alternative to a text-based query syntax would be to borrow
CouchDB's concept of views [1]. In CouchDB, views are pre-defined
indexes which are populated by filtering data through a pair of
map/reduce functions, which are usually written in JavaScript. Views
are somewhat limited in expressiveness and flexibility, and do not

address all possible use cases, but they are very efficient tocompute

and store, and are a fairly elegant system.

Some challenges come to mind:

Cassandra's distributed nature means that a node's queryable indexes

can/should only reference data in that same node's partition, andthat

a query might have to be executed on multiple nodes. For performance,
the query processing needs to be parallelized and pipelined.

Could a query planner/optimizer be able to reduce the number of nodes
required to satisfy a query by looking at the distribution of node
values across nodes? For example, if the column "first_name" value
"Foo" only occurs on node A, there's no need to involve node B. But
such knowledge requires the maintenance of statistics on each node

that cover all known peers, and the statistics must be kept up todate

to avoid glaring consistency issues.

Given the nature of Cassandra's column families it's not immediately
obvious to me how to best address columns in such a language.

[1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

A.


--
Ian Holsman
[email protected]

Re: Thoughts on a possible query language

Reply via email to