Re: [VOTE] Change data model names for 0.5

2009-08-24 Thread Sandeep Tata
-1

On Mon, Aug 24, 2009 at 12:29 PM, Evan Weaverewea...@gmail.com wrote:
 Resolved, that the data model names should be changed in Cassandra 0.5.

 Evan

 PS. Committers have the most weight, but everyone's voice is heard.

 --
 Evan Weaver



Re: Data model names, reloaded

2009-08-21 Thread Sandeep Tata
Evan,

 PS. The implementation of column families hasn't changed from
 BigTable, but the use in modeling has. Common Cassandra designs are
 more row-oriented than column-oriented.

I'm not sure I understand the distinction you're drawing between
row-oriented modeling and column-oriented modeling.
Are you talking about row-oriented modeling as placing entire objects
in a column (today's nomenclature) and treating a cassandra column
like a database row?

Sandeep


Re: [VOTE] 0.3.0-final

2009-06-26 Thread Sandeep Tata
+1

On Fri, Jun 26, 2009 at 10:49 AM, Jonathan Ellisjbel...@gmail.com wrote:
 I propose releasing 0.3.0-rc3 as 0.3.0-final.

 We've had some unofficial voting on the rc3 thread but this is the
 official one. :)

 Voting is open for 72h.

 binary build is at
 http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc3.tar.gz

 svn tag is 
 https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc3

 changelog is 
 https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc3/CHANGES.txt



Re: Thoughts on a possible query language

2009-06-22 Thread Sandeep Tata
There is some (unfinished) code in the current repo on CQL a SQL-like
Cassandra Query Language that is super simple and (AFAIK) limited to single
node queries.

I suspect there are bigger questions to tackle before we get to query
lanuages in the sense we're talking about--
1. Data model -- Cassandra's values are byte arrays. Any proposal for a
language needs to figure out precisely what data model you're planning to
support. (your examples include numbers, dates, strings)
2. Secondary indexes
3. Query runtime (queries that run on a single node, multiple nodes, query
optimizer?)

I've never understood the value of a parallel-programming abstraction
(map-reduce) for a single node database(CouchDB) ... and I certainly don't
think we're ready to build a map-reduce view engine *in* Cassandra right
now.

IMHO,  there are a bunch of interesting issues we will need to solve before
we can seriously talk about a query language.


On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo a...@bengler.no wrote:

 Has anyone given thought to how an SQL-like query language could be
 integrated into Cassandra?

 I'm thinking of something which would let you evaluate a limited set
 of relational select operators. For example:

  * first_name = 'Bob'
  * age  32
  * created_at between '2009-08' and '2009-09'
  * employer_id in (34543, 13177, 9338)

 First, is such functionality desired within the framework of
 Cassandra, or do people prefer to keep this functionality in a
 completely separate server component? There are pros and cons to keep
 queries inside Cassandra. I could enumerate them, but I would like to
 hear other people's thoughts first.

 An alternative to a text-based query syntax would be to borrow
 CouchDB's concept of views [1]. In CouchDB, views are pre-defined
 indexes which are populated by filtering data through a pair of
 map/reduce functions, which are usually written in JavaScript. Views
 are somewhat limited in expressiveness and flexibility, and do not
 address all possible use cases, but they are very efficient to compute
 and store, and are a fairly elegant system.

 Some challenges come to mind:

 Cassandra's distributed nature means that a node's queryable indexes
 can/should only reference data in that same node's partition, and that
 a query might have to be executed on multiple nodes. For performance,
 the query processing needs to be parallelized and pipelined.

 Could a query planner/optimizer be able to reduce the number of nodes
 required to satisfy a query by looking at the distribution of node
 values across nodes? For example, if the column first_name value
 Foo only occurs on node A, there's no need to involve node B. But
 such knowledge requires the maintenance of statistics on each node
 that cover all known peers, and the statistics must be kept up to date
 to avoid glaring consistency issues.

 Given the nature of Cassandra's column families it's not immediately
 obvious to me how to best address columns in such a language.

 [1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

 A.



Re: VOTE: Cassandra 0.3.0 release

2009-06-22 Thread Sandeep Tata
+1

On Mon, Jun 22, 2009 at 8:14 AM, Jonathan Ellis jbel...@gmail.com wrote:

 It's been two weeks since the last code change on the 0.3 branch and
 several days since RC2 release with no new bug reports.

 I move that we release RC2 as 0.3.0 official.

 svn tag:
 https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc2/

 binary build:
 http://people.apache.org/~jbellis/cassandra/cassandra-0.3.0-rc2.tar.gzhttp://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3.0-rc2.tar.gz

 Voting will run for 3 days.

 +1 from me.

 -Jonathan



Re: proposal: rename table to namespace

2009-06-20 Thread Sandeep Tata
I'd like for us to continue with Table as well.
I agree with Alexander's argument for what namespaces mean for most
CS domains.

Moving up a notch to a database is also confusing (Do we also have
tables? Are there tablespaces? Different storage engines for each
tablespace?)

We'll have to think of new names for columns and supercolumns too ---
I'd rather we stayed with Table





On Sat, Jun 20, 2009 at 11:16 AM, Chris Goffinetc...@chrisgoffinet.com wrote:
 I think we should keep it as 'table'. It's understood everywhere. I've
 always even heard BigTable call it a Table? I think namespace might just be
 more confusing.

 On Jun 20, 2009, at 6:54 AM, Jonathan Ellis wrote:

 Since we're proposing things that break stuff this weekend... :)

 I think we should rename table to namespace in the config file.
 Calling it table confuses people coming from an rdbms background
 (i.e. just about everyone).

 -Jonathan




Re: 0.3 and the OOM gremlin (CASSANDRA-208)

2009-06-03 Thread Sandeep Tata
Won't things like multi-table support break binary compatibility anyway?

We might be stuck with having to write a tool that migrates from a 0.3
format to a 0.4 format.


On Wed, Jun 3, 2009 at 2:44 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The fix for 208 [1] is fairly invasive.  should we

 (a) release another RC and do more testing before 0.3 final, or
 (b) release 0.3 without these changes, and only add this fix to trunk?

 Although I see the 0.3 release primarily as a means to let people
 start playing with the cassandra data model, I don't know that I want
 to release it knowing that 0.4 is going to be binary-incompatible with
 the 0.3 data files.  So I'd be inclined towards (a).

 [1] https://issues.apache.org/jira/browse/CASSANDRA-208

 -Jonathan



Re: Versioning scheme

2009-05-14 Thread Sandeep Tata
Looks like there are apache projects that do both:
Nutch had 0.8, 0.8.1, and now 0.9
Ant and Hadoop seem to follow 0.19.0 style numbering.

I don't care either way :)

On Thu, May 14, 2009 at 11:06 AM, Johan Oskarsson jo...@oskarsson.nu wrote:
 I guess this time it's my OCD that thinks having a 0.3 and then a 0.3.1
 feels wrong, something missing on the first one :)

 /Johan

 Jonathan Ellis wrote:
 There's nothing in 0.3 that implies there won't be a 0.3.1.

 On Thu, May 14, 2009 at 12:48 PM, Johan Oskarsson jo...@oskarsson.nu wrote:
 The current versions in jira are 0.3 and 0.4, should we not explicitly
 mention the point release?

 For example 0.3.0, to make it consistent when we release bug fixes in 0.3.1

 Thoughts?

 /Johan





Re: Submit patch link in jira

2009-05-14 Thread Sandeep Tata
On a related note, what's the deal with the Start work link? I used
to see it, but not so much for the newer tickets.

On Thu, May 14, 2009 at 12:31 PM, Jonathan Ellis jbel...@gmail.com wrote:
 this marks the ticket as I have a patch available so it shows up
 here: 
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=12310865status=10002

 let's try to remember to use this since it makes it easier to see what
 is ready for review. I know I have been sloppy here in the past and I
 will try to do better too.

 -Jonathan



Re: Row vs CF

2009-04-22 Thread Sandeep Tata
Yes, each CF has its own memtable. The writes are atomic in the sense
that I can still do an all-or-nothing write to multiple CFs (the
CommitLog still logs the whole row). Having multiple CFs with their
own memtable simply means that concurrent operations may not be
*isolated* from each other. So, if I have 2 operations:

Op1: Write(key1, CF1:col1=new, CF2:col2=new)
Op2: Read(key1, CF1:col1, CF2:col2)

Assuming both columns had old as the previous value, based on the
exec schedule Op2 could return one of:

old, old  -- Op2 before Op1
old, new -- Op1 writes CF2, then Op2 gets scheduled
new, old -- Op1 writes CF1, then Op2 gets scheduled
new, new -- Op1 before Op2

But with time (eventually), re-execution of Op2 will always return the
last result.

I agree that this is of limited value right now, but atomicity without
isolation can still be useful. It'll save the app some cleanup and
book-keeping code.



On Wed, Apr 22, 2009 at 9:36 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Wed, Apr 22, 2009 at 11:32 AM, Sandeep Tata sandeep.t...@gmail.com wrote:
 Having multiple CFs in a row could be useful for writes. Consider the
 case when you use one CF to store the data and another to store some
 kind of secondary index on that data. It will be useful to apply
 updates to both families atomically.

 Except that's not how it works, each Memtable (CF) has its own
 executor thread so even if you put multiple CFs in a Row it's not
 going to be atomic with the current system, and it's a big enough
 change to try to add some kind of coordination there that I don't
 think it's worth it.  (And you have seen that I am not scared of big
 changes, so that should give you pause. :)

 Back to YAGNI. :)  Row doesn't fit in the current execution model, so
 rather than leaving it as a half-baked creation, better to excise it
 and if we ever decide to support atomic updates across CFs then that
 would be the time to add it or something like it back.

 -Jonathan



Re: could cassandra be split into two parts?

2009-04-11 Thread Sandeep Tata
Depends on what exactly you have in mind ...

Almost all of the storage engine logic is in the db package. I don't
think it would be too hard to make this pluggable so you could slide
in your own DB, say based on Derby/MySQL/BDB etc... I can see how
specialized implementations of the database part could be useful for
different apps.

Do you expect that the API will still be the same put/get style thrift
API ? Or are you hoping to expose the additional abilities of the
underlying db through the thrift API ? That makes the question more
interesting (and complicated).


On Sat, Apr 11, 2009 at 6:33 PM, Ian Holsman i...@holsman.net wrote:
 hey.

 I was wondering how feasible it would be to de-couple the P2P layer of
 cassandra from the storage engine.
 I'd like to be able to plug in a non-column DB underneath, and use the DHT
 layer of cassandra.

 Is this something anyone else has considered doing?
 --
 Ian Holsman
 i...@holsman.net






Re: working together

2009-04-08 Thread Sandeep Tata
The refactoring question seems to be a bit of thorn:

 My understanding was that new committers come in and start with some feature
 implement that and then slowly start looking into what more they could do
 going forward. It is NOT come in and refactor the hell out of the system
 because you like something to be in a specific way. I do not beleive this
 will fly in any community. It is something like we now going through the
 entire code base and changing all the stuff just because I like it in a
 specific way. This seems ludicrous.

I think it is reasonable that a codebase that has evolved for over two
years has significant opportunity for refactoring when it is opened to
a host of new developers. That said, large scale refactoring *at this
stage* hurts us in two ways:

1. We don't have a rich suite of unit tests. Even automatic
refactoring without unit tests makes me uncomfortable.
2. Big refactoring makes it difficult for the original developers
(AP) to review patches quickly.

I can understand Avinash's resistance to big refactoring, and to some
extent, I agree.

While I think we may need significant refactoring as the codebase
moves forward (to simplify, keep it healthy and make contributions
easier), perhaps we should hold off on accepting big refactorings
until:
a) We have a richer suite of unit tests.
b) We've done an initial stable release

That seems like a reasonable restriction on the refactoring story, yes ?
Avinash, Prashant, Jonathan, others  -- does this seem like a good
strategy? Alternative ideas?


Re: secondary index support in Cassandra

2009-03-26 Thread Sandeep Tata
The compaction optimization that Prashant mentioned is likely to solve
many of the problems that Jun brings up.

We were thinking of tackling this problem ... I've opened a ticket in
JIRA (https://issues.apache.org/jira/browse/CASSANDRA-16)

Avinash, Prashant -- If you guys are already working on it, feel free
to assign it to yourself. Otherwise we'll sketch out a plan and send
it out, if the community agrees on the idea, we can start hacking
away.

Sandeep


On Wed, Mar 25, 2009 at 10:50 AM, Jun Rao jun...@almaden.ibm.com wrote:

 Some comments inlined below.

 Jun
 IBM Almaden Research Center
 K55/B1, 650 Harry Road, San Jose, CA  95120-6099

 jun...@almaden.ibm.com


 Avinash Lakshman avinash.laksh...@gmail.com wrote on 03/24/2009 10:08:45
 PM:

 Comments inline.

 On Tue, Mar 24, 2009 at 6:53 PM, Jun Rao jun...@almaden.ibm.com wrote:

 
  Prashant,
 
  Thanks for the comments. They are quite useful. Let me try to address
 some
  of the points that you made.
 
  1. It is true that in our current implementation, we can glue the
 changes
  on both the data and the index in one batch_update() call. This way,
 the
  data and the index will be maintained synchronously. However,
 maintaining
  the index on the server is likely more efficient since there is less
  communication overhead. You seem to agree with this.


 [Avinash] You can update multiple column families for a single key in one
 mutation.

 
 
  2. Cassandra currently doesn't acquire row-lock for row accesses.
 However,
  the implication is that a reader may see partial updates of a row. For
  example, suppose that a writer updates two columns in different CFs.
 Then,
  it is possible for a concurrent reader to see the update on one column,
 but
  not the other one. For some applications, row-level consistency could
 be
  important. It's probably for this reason, in HBase, a region server
  acquires a row lock for every read and write.


 [Avinash] Updates to a single row within a machine are atomic. Which
 means
 what you are stating will not happen. Writes and reads will be serialized
 at
 the Memtable.

 This problem doesn't show up in Cassandra today because there is no method
 that can read columns from different CFs in a row. If there were such a
 method, it would be hard to enforce that a reader always sees a complete
 update (updating multiple CFs) without some sort of row locks.


 
 
  3. For our current application, the size of all entities in a group is
 not
  too large and likely fits within the capacity of a single node.
 However,
  for other applications, being able to scale a group to more than a node
  could be useful. Storing a group within a single row will prevent
 scaling
  out the group.

 [Avinash] I guess the question is how many entities do you envision in a
 group. What do you mean by fitting into one node?


 A large group may not fit in memory, but should fit in a commodity disk.
 The compaction optimization Prashant mentioned will definitely make our
 current approach more feasible.

 However, in general, I am a bit concerned about putting too much stuff
 within a row. A row is a unit that has finite capacity and a user shouldn't
 expect to put an infinite number of columns within a row. I actually like
 the current assumption in Cassandra that a row has to fit in memory since
 it simplifies the implementation. On the other hand, a table can have
 arbitrary capacity (one just need to provision enough nodes in the cluster)
 and it can have as many rows as you want.

 
 
  Jun
  IBM Almaden Research Center
  K55/B1, 650 Harry Road, San Jose, CA  95120-6099
 
  jun...@almaden.ibm.com
 
 
  Prashant Malik pma...@gmail.com wrote on 03/24/2009 11:34:51 AM:
 
   Some questions Iline
  
   On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao jun...@almaden.ibm.com
  wrote:
  
   
   
We have an application that has groups and entities. A group has
 many
entities and an entity has a bunch of (attribute, value) pairs. A
  common
access pattern is to select some number of entities within a group
 with
attribute X equals to x and ordered by attribute Y. For efficiency,
 we
  want
to build a secondary index for each group and collocate a group and
 its
secondary index on the same node. Our current approach is to map a
  group to
a row in Cassandra and each entity to a column in a column family
 (CF).
Within the same row, we use a separate CF (ordered by name) to
  implement a
secondary index, say on attribute X and Y. In this family, each
 column
  name
has the form of X:x:Y:y:entityID. We extended the get_slice()
 function
  so
that it can get a slice of columns starting from a given column.
 The
extended function uses the column-level index to locate the
 starting
  column
quickly. (We'd be happy to contribute this extension back to
 Cassandra
  if
people find this useful). Using the extended get_slice(), we were
 able
  to
access the entities through the simulated secondary