Unsubscribe On Monday, October 3, 2016, Benedict Elliott Smith <bened...@apache.org> wrote:
> While that sentence leaves a lot to be desired (for me because it confers > a different meaning on row store), it doesn't say "Cassandra is like a > RDBMS" - it says "like an RDBMS, it organises data by rows and columns" - > i.e., in this regard only it is like an RDBMS, not more generally. > > I believe it was meant to help people, especially those afraid of the > NoSQL thrift world, understand that it still uses the basic concept of a > rows and columns they are used to. I agree it could be improved to > minimise the chance of misreading it, and I'm certain contributions would > be welcome here. > > I don't personally want to get bogged down in analysing every piece of > text anyone has ever written, so I'll bow out of further discussion on > this. These phrases may all be suboptimal, but they are certainly > defensible. Column store is not, that's all I wanted to contribute here. > > > > > > On 1 October 2016 at 19:35, Peter Lin <wool...@gmail.com > <javascript:_e(%7B%7D,'cvml','wool...@gmail.com');>> wrote: > >> I'll second Ed's comment. >> >> The documentation should be more careful when using phrases "like >> relational databases". When we look at the history of relational databases, >> people expect certain things like ACID transactions, primary/foriegn key >> constraints, query planners, joins and relational algebra. Clearly >> Cassandra's storage engine does not follow most of those principals for a >> good reason. >> >> The term row oriented storage would be more descriptive and appropriate. >> It avoids conflating Cassandra storage engine with "traditional" relational >> storage engines. Those of us that have spent over a decade using IBM DB2, >> Oracle, Sql Server and Sybase tend to think of relational databases in a >> certain way. If we go back to 1998, most RDBMS storage engine had a max row >> size limit. Databases like Sybase before version 9 preferred RAW disk for >> optimal performance. I can go on and on, but there's no point really. >> >> Cassandra's storage engine is "row oriented", but it's not relational in >> RDBMS sense. We do everyone a huge disservice by using confusing >> terminology and then making fun of those who get confused. No one wins when >> that happens. At the end of the day, what differentiates cassandra's >> storage engine is it support static and dynamic columns, which traditional >> RDBMS don't support today. Calling Cassandra storage "distributed tables" >> doesn't really help in my bias opinion. >> >> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses >> distributed tables" they might answer "so what, sql server and oracle can >> do that too." The difference is with RDBMS the partitioning is optional and >> requires more work to configure. Whereas with Cassandra you can have >> everything in 1 node, which means there is only 1 partition and no >> different to 1 instance of sql server. Where you win is when you need to >> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and >> Oracle you have to do a little bit more work. I've lost count of how many >> times I've to explained noSql databases to RDBMS admins and had to explain >> the official docs are stupid. >> >> >> >> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <edlinuxg...@gmail.com >> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote: >> >>> https://github.com/apache/cassandra >>> >>> Row store <http://wiki.apache.org/cassandra/DataModel> means that like >>> relational databases, Cassandra organizes data by rows and columns. The >>> Cassandra Query Language (CQL) is a close relative of SQL. >>> >>> I generally do not know what to say about these high level >>> "oversimplifications" like "firewalls block hackers". Are there "firewalls" >>> or do they mean IP routers with layer 4 packet inspections and layer 3 >>> Access Control Lists? >>> >>> We say (and I catch myself doing it all the time) "like relational >>> databases" often as if all relational databases work alike. A columnar >>> store like HP Vertica is a relational database.MySql has different storage >>> engines does MyIsam work like InnoDB? >>> >>> Google docs organizes data by rows and columns as well. You can wrap any >>> storage system into an API that makes them look like rows and columns. >>> Microsoft LINQ can enumerate your network cars and query them >>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really >>> does not make your network cards a "row store" >>> >>> "Theoretically a row can have 2 billion columns, but in practice it >>> shouldn't have more than 100 million columns." >>> In practice (In my experience) the number is much lower than 100 >>> million, and if the data actually is deleted and readded frequently the >>> number of live columns(rows, whatever) you can use happily is even lower >>> >>> >>> I believe on twitter (I am unable to find the tweet) someone was trying >>> to convince me Cassandra was a "columnar analytic database". ROFL >>> >>> I believe telling someone it "row store" "like a database", is not a >>> good idea. They might away content with that explanation. You are setting >>> them up to walk into an anti-pattern. Like a case where the user is >>> attempting to write and deleting 1 row and 1 column 6 billion times a day. >>> Then you end up explaining to them http://stackoverflow.com/ >>> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached >>> >>> and how the cassandra storage model is not "like a relational database". >>> >>> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <edlinuxg...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote: >>> >>>> I can iterate over JSON data stored in mongo and present it as a table >>>> with rows and columns. It does not make mongo a rowstore. >>>> >>>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxg...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote: >>>> >>>>> The problem with calling it a row store: >>>>> >>>>> https://en.wikipedia.org/wiki/Row_(database) >>>>> >>>>> In the context of a relational database >>>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also >>>>> called a record >>>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple >>>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly >>>>> structured data <https://en.wikipedia.org/wiki/Data> item in a table >>>>> <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms, a >>>>> database table can be thought of as consisting of *rows* andcolumns >>>>> <https://en.wikipedia.org/wiki/Column_(database)> or fields >>>>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1] >>>>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row >>>>> in a table represents a set of related data, and every row in the table >>>>> has >>>>> the same structure. >>>>> >>>>> When you have static columns and rows with maps, and lists, it is hard >>>>> to argue that every row has the same structure. Physically at the storage >>>>> layer they do not have the same structure and logically when accessing the >>>>> data they barely have the same structure, as the static column is just >>>>> appearing inside each row it is actually not contained in. >>>>> >>>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <j...@jonhaddad.com >>>>> <javascript:_e(%7B%7D,'cvml','j...@jonhaddad.com');>> wrote: >>>>> >>>>>> +1000 to what Benedict says. I usually call it a "partitioned row >>>>>> store" which usually needs some extra explanation but is more accurate >>>>>> than >>>>>> "column family" or whatever other thrift era terminology people still >>>>>> use. >>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com >>>>>> <javascript:_e(%7B%7D,'cvml','doanduy...@gmail.com');>> wrote: >>>>>> >>>>>>> I used to present Cassandra as a NoSQL datastore with "distributed" >>>>>>> table. This definition is closer to CQL and has some academic background >>>>>>> (distributed hash table). >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith < >>>>>>> bened...@apache.org >>>>>>> <javascript:_e(%7B%7D,'cvml','bened...@apache.org');>> wrote: >>>>>>> >>>>>>>> Cassandra is not a "wide column store" anymore. It has a schema. >>>>>>>> Only thrift users no longer think they have a schema (though they do), >>>>>>>> and >>>>>>>> thrift is being deprecated. >>>>>>>> >>>>>>>> I really wish everyone would kill the term "wide column store" with >>>>>>>> fire. It seems to have never meant anything beyond "schema-less, >>>>>>>> row-oriented", and a "column store" means literally the opposite of >>>>>>>> this. >>>>>>>> >>>>>>>> Not only that, but people don't even seem to realise the term >>>>>>>> "column store" existed long before "wide column store" and the latter >>>>>>>> is >>>>>>>> often abbreviated to the former, as here: >>>>>>>> http://www.planetcassandra.org/what-is-nosql/ >>>>>>>> >>>>>>>> Since it no longer applies, let's all agree as a community to >>>>>>>> forget this awful nomenclature ever existed. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 30 September 2016 at 18:09, Joaquin Casares < >>>>>>>> joaq...@thelastpickle.com >>>>>>>> <javascript:_e(%7B%7D,'cvml','joaq...@thelastpickle.com');>> wrote: >>>>>>>> >>>>>>>>> Hi Mehdi, >>>>>>>>> >>>>>>>>> I can help clarify a few things. >>>>>>>>> >>>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a >>>>>>>>> row can have 2 billion columns, but in practice it shouldn't have >>>>>>>>> more than >>>>>>>>> 100 million columns. >>>>>>>>> >>>>>>>>> Cassandra partitions data to certain nodes based on the partition >>>>>>>>> key(s), but does provide the option of setting zero or more clustering >>>>>>>>> keys. Together, the partition key(s) and clustering key(s) form the >>>>>>>>> primary >>>>>>>>> key. >>>>>>>>> >>>>>>>>> When writing to Cassandra, you will need to provide the full >>>>>>>>> primary key, however, when reading from Cassandra, you only need to >>>>>>>>> provide >>>>>>>>> the full partition key. >>>>>>>>> >>>>>>>>> When you only provide the partition key for a read operation, >>>>>>>>> you're able to return all columns that exist on that partition with >>>>>>>>> low >>>>>>>>> latency. These columns are displayed as "CQL rows" to make it easier >>>>>>>>> to >>>>>>>>> reason about. >>>>>>>>> >>>>>>>>> Consider the schema: >>>>>>>>> >>>>>>>>> CREATE TABLE foo ( >>>>>>>>> bar uuid, >>>>>>>>> >>>>>>>>> boz uuid, >>>>>>>>> >>>>>>>>> baz timeuuid, >>>>>>>>> data1 text, >>>>>>>>> >>>>>>>>> data2 text, >>>>>>>>> >>>>>>>>> PRIMARY KEY ((bar, boz), baz) >>>>>>>>> >>>>>>>>> ); >>>>>>>>> >>>>>>>>> >>>>>>>>> When you write to Cassandra you will need to send bar, boz, and >>>>>>>>> baz and optionally data*, if it's relevant for that CQL row. If you >>>>>>>>> chose >>>>>>>>> not to define a data* field for a particular CQL row, then nothing is >>>>>>>>> stored nor allocated on disk. But I wouldn't consider that caveat to >>>>>>>>> be >>>>>>>>> "schema-less". >>>>>>>>> >>>>>>>>> However, all writes to the same bar/boz will end up on the same >>>>>>>>> Cassandra replica set (a configurable number of nodes) and be stored >>>>>>>>> on the >>>>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field >>>>>>>>> that's >>>>>>>>> not a partition key is stored as a column, including clustering keys >>>>>>>>> (this >>>>>>>>> is optimized in Cassandra 3+, but now we're getting deep into >>>>>>>>> internals). >>>>>>>>> >>>>>>>>> In this way you can get fast responses for all activity for >>>>>>>>> bar/boz either over time, or for a specific time, with roughly the >>>>>>>>> same >>>>>>>>> number of disk seeks, with varying lengths on the disk scans. >>>>>>>>> >>>>>>>>> Hope that helps! >>>>>>>>> >>>>>>>>> Joaquin Casares >>>>>>>>> Consultant >>>>>>>>> Austin, TX >>>>>>>>> >>>>>>>>> Apache Cassandra Consulting >>>>>>>>> http://www.thelastpickle.com >>>>>>>>> >>>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso < >>>>>>>>> i...@mrcalonso.com >>>>>>>>> <javascript:_e(%7B%7D,'cvml','i...@mrcalonso.com');>> wrote: >>>>>>>>> >>>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en >>>>>>>>>> /system/Cassandra >>>>>>>>>> >>>>>>>>>> Carlos Alonso | Software Engineer | @calonso >>>>>>>>>> <https://twitter.com/calonso> >>>>>>>>>> >>>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada < >>>>>>>>>> mehdi.b...@dbi-services.com >>>>>>>>>> <javascript:_e(%7B%7D,'cvml','mehdi.b...@dbi-services.com');>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a theoritical question: >>>>>>>>>>> - Is Apache Cassandra really a column store? >>>>>>>>>>> Column store mean storing the data as column rather than as a >>>>>>>>>>> rows. >>>>>>>>>>> >>>>>>>>>>> In fact C* store the data as row, and data is partionned with >>>>>>>>>>> row key. >>>>>>>>>>> >>>>>>>>>>> Finally, for me, Cassandra is a row oriented schema less >>>>>>>>>>> DBMS.... Is it true for you also??? >>>>>>>>>>> >>>>>>>>>>> Many thanks in advance for your reply >>>>>>>>>>> >>>>>>>>>>> Best Regards >>>>>>>>>>> Mehdi Bada >>>>>>>>>>> ---- >>>>>>>>>>> >>>>>>>>>>> *Mehdi Bada* | Consultant >>>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 >>>>>>>>>>> 32 422 96 15 >>>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont >>>>>>>>>>> mehdi.b...@dbi-services.com >>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','mehdi.b...@dbi-services.com');> >>>>>>>>>>> www.dbi-services.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – >>>>>>>>>>> Join the team >>>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>* >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >