Re: [VOTE] Change data model names for 0.5
-1 On Mon, Aug 24, 2009 at 12:29 PM, Evan Weaverewea...@gmail.com wrote: Resolved, that the data model names should be changed in Cassandra 0.5. Evan PS. Committers have the most weight, but everyone's voice is heard. -- Evan Weaver
Re: Data model names, reloaded
Evan, PS. The implementation of column families hasn't changed from BigTable, but the use in modeling has. Common Cassandra designs are more row-oriented than column-oriented. I'm not sure I understand the distinction you're drawing between row-oriented modeling and column-oriented modeling. Are you talking about row-oriented modeling as placing entire objects in a column (today's nomenclature) and treating a cassandra column like a database row? Sandeep
Re: [VOTE] 0.3.0-final
+1 On Fri, Jun 26, 2009 at 10:49 AM, Jonathan Ellisjbel...@gmail.com wrote: I propose releasing 0.3.0-rc3 as 0.3.0-final. We've had some unofficial voting on the rc3 thread but this is the official one. :) Voting is open for 72h. binary build is at http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc3.tar.gz svn tag is https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc3 changelog is https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc3/CHANGES.txt
Re: Thoughts on a possible query language
There is some (unfinished) code in the current repo on CQL a SQL-like Cassandra Query Language that is super simple and (AFAIK) limited to single node queries. I suspect there are bigger questions to tackle before we get to query lanuages in the sense we're talking about-- 1. Data model -- Cassandra's values are byte arrays. Any proposal for a language needs to figure out precisely what data model you're planning to support. (your examples include numbers, dates, strings) 2. Secondary indexes 3. Query runtime (queries that run on a single node, multiple nodes, query optimizer?) I've never understood the value of a parallel-programming abstraction (map-reduce) for a single node database(CouchDB) ... and I certainly don't think we're ready to build a map-reduce view engine *in* Cassandra right now. IMHO, there are a bunch of interesting issues we will need to solve before we can seriously talk about a query language. On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo a...@bengler.no wrote: Has anyone given thought to how an SQL-like query language could be integrated into Cassandra? I'm thinking of something which would let you evaluate a limited set of relational select operators. For example: * first_name = 'Bob' * age 32 * created_at between '2009-08' and '2009-09' * employer_id in (34543, 13177, 9338) First, is such functionality desired within the framework of Cassandra, or do people prefer to keep this functionality in a completely separate server component? There are pros and cons to keep queries inside Cassandra. I could enumerate them, but I would like to hear other people's thoughts first. An alternative to a text-based query syntax would be to borrow CouchDB's concept of views [1]. In CouchDB, views are pre-defined indexes which are populated by filtering data through a pair of map/reduce functions, which are usually written in JavaScript. Views are somewhat limited in expressiveness and flexibility, and do not address all possible use cases, but they are very efficient to compute and store, and are a fairly elegant system. Some challenges come to mind: Cassandra's distributed nature means that a node's queryable indexes can/should only reference data in that same node's partition, and that a query might have to be executed on multiple nodes. For performance, the query processing needs to be parallelized and pipelined. Could a query planner/optimizer be able to reduce the number of nodes required to satisfy a query by looking at the distribution of node values across nodes? For example, if the column first_name value Foo only occurs on node A, there's no need to involve node B. But such knowledge requires the maintenance of statistics on each node that cover all known peers, and the statistics must be kept up to date to avoid glaring consistency issues. Given the nature of Cassandra's column families it's not immediately obvious to me how to best address columns in such a language. [1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views A.
Re: VOTE: Cassandra 0.3.0 release
+1 On Mon, Jun 22, 2009 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote: It's been two weeks since the last code change on the 0.3 branch and several days since RC2 release with no new bug reports. I move that we release RC2 as 0.3.0 official. svn tag: https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc2/ binary build: http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc2.tar.gzhttp://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3.0-rc2.tar.gz Voting will run for 3 days. +1 from me. -Jonathan
Re: proposal: rename table to namespace
I'd like for us to continue with Table as well. I agree with Alexander's argument for what namespaces mean for most CS domains. Moving up a notch to a database is also confusing (Do we also have tables? Are there tablespaces? Different storage engines for each tablespace?) We'll have to think of new names for columns and supercolumns too --- I'd rather we stayed with Table On Sat, Jun 20, 2009 at 11:16 AM, Chris Goffinetc...@chrisgoffinet.com wrote: I think we should keep it as 'table'. It's understood everywhere. I've always even heard BigTable call it a Table? I think namespace might just be more confusing. On Jun 20, 2009, at 6:54 AM, Jonathan Ellis wrote: Since we're proposing things that break stuff this weekend... :) I think we should rename table to namespace in the config file. Calling it table confuses people coming from an rdbms background (i.e. just about everyone). -Jonathan
Re: 0.3 and the OOM gremlin (CASSANDRA-208)
Won't things like multi-table support break binary compatibility anyway? We might be stuck with having to write a tool that migrates from a 0.3 format to a 0.4 format. On Wed, Jun 3, 2009 at 2:44 PM, Jonathan Ellis jbel...@gmail.com wrote: The fix for 208 [1] is fairly invasive. should we (a) release another RC and do more testing before 0.3 final, or (b) release 0.3 without these changes, and only add this fix to trunk? Although I see the 0.3 release primarily as a means to let people start playing with the cassandra data model, I don't know that I want to release it knowing that 0.4 is going to be binary-incompatible with the 0.3 data files. So I'd be inclined towards (a). [1] https://issues.apache.org/jira/browse/CASSANDRA-208 -Jonathan
Re: Versioning scheme
Looks like there are apache projects that do both: Nutch had 0.8, 0.8.1, and now 0.9 Ant and Hadoop seem to follow 0.19.0 style numbering. I don't care either way :) On Thu, May 14, 2009 at 11:06 AM, Johan Oskarsson jo...@oskarsson.nu wrote: I guess this time it's my OCD that thinks having a 0.3 and then a 0.3.1 feels wrong, something missing on the first one :) /Johan Jonathan Ellis wrote: There's nothing in 0.3 that implies there won't be a 0.3.1. On Thu, May 14, 2009 at 12:48 PM, Johan Oskarsson jo...@oskarsson.nu wrote: The current versions in jira are 0.3 and 0.4, should we not explicitly mention the point release? For example 0.3.0, to make it consistent when we release bug fixes in 0.3.1 Thoughts? /Johan
Re: Submit patch link in jira
On a related note, what's the deal with the Start work link? I used to see it, but not so much for the newer tickets. On Thu, May 14, 2009 at 12:31 PM, Jonathan Ellis jbel...@gmail.com wrote: this marks the ticket as I have a patch available so it shows up here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=12310865status=10002 let's try to remember to use this since it makes it easier to see what is ready for review. I know I have been sloppy here in the past and I will try to do better too. -Jonathan
Re: Row vs CF
Yes, each CF has its own memtable. The writes are atomic in the sense that I can still do an all-or-nothing write to multiple CFs (the CommitLog still logs the whole row). Having multiple CFs with their own memtable simply means that concurrent operations may not be *isolated* from each other. So, if I have 2 operations: Op1: Write(key1, CF1:col1=new, CF2:col2=new) Op2: Read(key1, CF1:col1, CF2:col2) Assuming both columns had old as the previous value, based on the exec schedule Op2 could return one of: old, old -- Op2 before Op1 old, new -- Op1 writes CF2, then Op2 gets scheduled new, old -- Op1 writes CF1, then Op2 gets scheduled new, new -- Op1 before Op2 But with time (eventually), re-execution of Op2 will always return the last result. I agree that this is of limited value right now, but atomicity without isolation can still be useful. It'll save the app some cleanup and book-keeping code. On Wed, Apr 22, 2009 at 9:36 AM, Jonathan Ellis jbel...@gmail.com wrote: On Wed, Apr 22, 2009 at 11:32 AM, Sandeep Tata sandeep.t...@gmail.com wrote: Having multiple CFs in a row could be useful for writes. Consider the case when you use one CF to store the data and another to store some kind of secondary index on that data. It will be useful to apply updates to both families atomically. Except that's not how it works, each Memtable (CF) has its own executor thread so even if you put multiple CFs in a Row it's not going to be atomic with the current system, and it's a big enough change to try to add some kind of coordination there that I don't think it's worth it. (And you have seen that I am not scared of big changes, so that should give you pause. :) Back to YAGNI. :) Row doesn't fit in the current execution model, so rather than leaving it as a half-baked creation, better to excise it and if we ever decide to support atomic updates across CFs then that would be the time to add it or something like it back. -Jonathan
Re: could cassandra be split into two parts?
Depends on what exactly you have in mind ... Almost all of the storage engine logic is in the db package. I don't think it would be too hard to make this pluggable so you could slide in your own DB, say based on Derby/MySQL/BDB etc... I can see how specialized implementations of the database part could be useful for different apps. Do you expect that the API will still be the same put/get style thrift API ? Or are you hoping to expose the additional abilities of the underlying db through the thrift API ? That makes the question more interesting (and complicated). On Sat, Apr 11, 2009 at 6:33 PM, Ian Holsman i...@holsman.net wrote: hey. I was wondering how feasible it would be to de-couple the P2P layer of cassandra from the storage engine. I'd like to be able to plug in a non-column DB underneath, and use the DHT layer of cassandra. Is this something anyone else has considered doing? -- Ian Holsman i...@holsman.net
Re: working together
The refactoring question seems to be a bit of thorn: My understanding was that new committers come in and start with some feature implement that and then slowly start looking into what more they could do going forward. It is NOT come in and refactor the hell out of the system because you like something to be in a specific way. I do not beleive this will fly in any community. It is something like we now going through the entire code base and changing all the stuff just because I like it in a specific way. This seems ludicrous. I think it is reasonable that a codebase that has evolved for over two years has significant opportunity for refactoring when it is opened to a host of new developers. That said, large scale refactoring *at this stage* hurts us in two ways: 1. We don't have a rich suite of unit tests. Even automatic refactoring without unit tests makes me uncomfortable. 2. Big refactoring makes it difficult for the original developers (AP) to review patches quickly. I can understand Avinash's resistance to big refactoring, and to some extent, I agree. While I think we may need significant refactoring as the codebase moves forward (to simplify, keep it healthy and make contributions easier), perhaps we should hold off on accepting big refactorings until: a) We have a richer suite of unit tests. b) We've done an initial stable release That seems like a reasonable restriction on the refactoring story, yes ? Avinash, Prashant, Jonathan, others -- does this seem like a good strategy? Alternative ideas?
Re: secondary index support in Cassandra
The compaction optimization that Prashant mentioned is likely to solve many of the problems that Jun brings up. We were thinking of tackling this problem ... I've opened a ticket in JIRA (https://issues.apache.org/jira/browse/CASSANDRA-16) Avinash, Prashant -- If you guys are already working on it, feel free to assign it to yourself. Otherwise we'll sketch out a plan and send it out, if the community agrees on the idea, we can start hacking away. Sandeep On Wed, Mar 25, 2009 at 10:50 AM, Jun Rao jun...@almaden.ibm.com wrote: Some comments inlined below. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com Avinash Lakshman avinash.laksh...@gmail.com wrote on 03/24/2009 10:08:45 PM: Comments inline. On Tue, Mar 24, 2009 at 6:53 PM, Jun Rao jun...@almaden.ibm.com wrote: Prashant, Thanks for the comments. They are quite useful. Let me try to address some of the points that you made. 1. It is true that in our current implementation, we can glue the changes on both the data and the index in one batch_update() call. This way, the data and the index will be maintained synchronously. However, maintaining the index on the server is likely more efficient since there is less communication overhead. You seem to agree with this. [Avinash] You can update multiple column families for a single key in one mutation. 2. Cassandra currently doesn't acquire row-lock for row accesses. However, the implication is that a reader may see partial updates of a row. For example, suppose that a writer updates two columns in different CFs. Then, it is possible for a concurrent reader to see the update on one column, but not the other one. For some applications, row-level consistency could be important. It's probably for this reason, in HBase, a region server acquires a row lock for every read and write. [Avinash] Updates to a single row within a machine are atomic. Which means what you are stating will not happen. Writes and reads will be serialized at the Memtable. This problem doesn't show up in Cassandra today because there is no method that can read columns from different CFs in a row. If there were such a method, it would be hard to enforce that a reader always sees a complete update (updating multiple CFs) without some sort of row locks. 3. For our current application, the size of all entities in a group is not too large and likely fits within the capacity of a single node. However, for other applications, being able to scale a group to more than a node could be useful. Storing a group within a single row will prevent scaling out the group. [Avinash] I guess the question is how many entities do you envision in a group. What do you mean by fitting into one node? A large group may not fit in memory, but should fit in a commodity disk. The compaction optimization Prashant mentioned will definitely make our current approach more feasible. However, in general, I am a bit concerned about putting too much stuff within a row. A row is a unit that has finite capacity and a user shouldn't expect to put an infinite number of columns within a row. I actually like the current assumption in Cassandra that a row has to fit in memory since it simplifies the implementation. On the other hand, a table can have arbitrary capacity (one just need to provision enough nodes in the cluster) and it can have as many rows as you want. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com Prashant Malik pma...@gmail.com wrote on 03/24/2009 11:34:51 AM: Some questions Iline On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao jun...@almaden.ibm.com wrote: We have an application that has groups and entities. A group has many entities and an entity has a bunch of (attribute, value) pairs. A common access pattern is to select some number of entities within a group with attribute X equals to x and ordered by attribute Y. For efficiency, we want to build a secondary index for each group and collocate a group and its secondary index on the same node. Our current approach is to map a group to a row in Cassandra and each entity to a column in a column family (CF). Within the same row, we use a separate CF (ordered by name) to implement a secondary index, say on attribute X and Y. In this family, each column name has the form of X:x:Y:y:entityID. We extended the get_slice() function so that it can get a slice of columns starting from a given column. The extended function uses the column-level index to locate the starting column quickly. (We'd be happy to contribute this extension back to Cassandra if people find this useful). Using the extended get_slice(), we were able to access the entities through the simulated secondary