Re: Cassandra data model right definition

Russell Bradberry Fri, 30 Sep 2016 14:18:28 -0700

I agree 100%, this misunderstanding really bothers me as well.  I like the term 
“Partitioned Row Store” even though I am guilty of using the legacy 
“Column-Family Store” from darker times.  Even databases like Scylla which is 
supposed to be an Apache Cassandra clone tout themselves as a column-store, 
which is just utterly backwards as you mentioned.

From: Benedict Elliott Smith <bened...@apache.org>
Reply-To: <user@cassandra.apache.org>
Date: Friday, September 30, 2016 at 5:12 PM
To: <user@cassandra.apache.org>
Subject: Re: Cassandra data model right definition

Absolutely.  A "partitioned row store" is exactly what I would call it.  As it 
happens, our README thinks the same, which is fantastic.  

I thought I'd take a look at the rest of our cohort, and didn't get far before 
disappointment.  HBase literally calls itself a "column-oriented store" - which 
is so totally wrong it's simultaneously hilarious and tragic.  

I guess we can't blame the wider internet for misunderstanding/misnaming us 
poor "wide column stores" if even one of the major examples doesn't know what 
it, itself, is!

On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:

+1000 to what Benedict says. I usually call it a "partitioned row store" which 
usually needs some extra explanation but is more accurate than "column family" 
or whatever other thrift era terminology people still use. 

On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:

I used to present Cassandra as a NoSQL datastore with "distributed" table. This 
definition is closer to CQL and has some academic background (distributed hash 
table).

On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <bened...@apache.org> 
wrote:

Cassandra is not a "wide column store" anymore.  It has a schema.  Only thrift 
users no longer think they have a schema (though they do), and thrift is being 
deprecated.

I really wish everyone would kill the term "wide column store" with fire.  It 
seems to have never meant anything beyond "schema-less, row-oriented", and a 
"column store" means literally the opposite of this.

Not only that, but people don't even seem to realise the term "column store" 
existed long before "wide column store" and the latter is often abbreviated to 
the former, as here: http://www.planetcassandra.org/what-is-nosql/ 

Since it no longer applies, let's all agree as a community to forget this awful 
nomenclature ever existed.

On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com> 
wrote:

Hi Mehdi,

I can help clarify a few things.

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 
2 billion columns, but in practice it shouldn't have more than 100 million 
columns.

Cassandra partitions data to certain nodes based on the partition key(s), but 
does provide the option of setting zero or more clustering keys. Together, the 
partition key(s) and clustering key(s) form the primary key.

When writing to Cassandra, you will need to provide the full primary key, 
however, when reading from Cassandra, you only need to provide the full 
partition key.

When you only provide the partition key for a read operation, you're able to 
return all columns that exist on that partition with low latency. These columns 
are displayed as "CQL rows" to make it easier to reason about.

Consider the schema:

CREATE TABLE foo (

  bar uuid,

  boz uuid,

  baz timeuuid,

  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);

When you write to Cassandra you will need to send bar, boz, and baz and 
optionally data*, if it's relevant for that CQL row. If you chose not to define 
a data* field for a particular CQL row, then nothing is stored nor allocated on 
disk. But I wouldn't consider that caveat to be "schema-less".

However, all writes to the same bar/boz will end up on the same Cassandra 
replica set (a configurable number of nodes) and be stored on the same place(s) 
on disk within the SSTable(s). And on disk, each field that's not a partition 
key is stored as a column, including clustering keys (this is optimized in 
Cassandra 3+, but now we're getting deep into internals).

In this way you can get fast responses for all activity for bar/boz either over 
time, or for a specific time, with roughly the same number of disk seeks, with 
varying lengths on the disk scans.

Hope that helps!

Joaquin Casares

Consultant

Austin, TX

Apache Cassandra Consulting

http://www.thelastpickle.com

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com> wrote:

Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra

Carlos Alonso | Software Engineer | @calonso

On 30 September 2016 at 18:24, Mehdi Bada <mehdi.b...@dbi-services.com> wrote:

Hi all, 

I have a theoritical question: 

- Is Apache Cassandra really a column store?

Column store mean storing the data as column rather than as a rows. 

In fact C* store the data as row, and data is partionned with row key.

Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it true 
for you also???

Many thanks in advance for your reply

Best Regards 

Mehdi Bada

----

Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 

dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com 

www.dbi-services.com

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team

Re: Cassandra data model right definition

Reply via email to