On Thu, Sep 10, 2009 at 7:57 PM, Matt Corgan <[email protected]> wrote: > 1) What is the difference between a super-column like: > > homeAddress: { > street: “1234 x street”, > city: “san francisco”, > zip: “94107″, > } > > and the BigTable or HBase style of concatenating nested keys together > into something like: > > homeAddress/street:”1234 x street”, > homeAddress/city: “san francisco”, > homeAddress/zip: “94017″ > > Wouldn’t they be sorted the same way on disk and be similarly > efficient for range queries? Is it that you avoid storing the string > “homeAddress” redundantly?
[Note that in Cassandra we refer to column "names" to avoid confusion w/ row "keys."] This is primarily useful when your column set is not fixed. Cassandra can currently handle up to a million or so columns without problems, and with a little work could handle billions. So treating a row as an associative array with dynamic column names that are determined at runtime is a totally legitimate thing to do. So if you are storing "objects" like address data, a supercolumn maps more closely to what you would think of in an OO language as Map<String, Address> addresses rather than having to treat each field separately: Map<String, String> streets Map<String, String> cities Map<String, String> zip Besides being a more natural fit for the data, your row-level index of column names is much more effective when related data is grouped like this, than when you repeat the name N times for N fields. > 2) Can SuperColumns only add one level of nesting beyond normal > columns? That seems limiting considerng BigTable and HBase can append > an arbitrary number of nested keys together. Yes, only one level of nesting. Remember, column names are just a byte[]. You can still smush column names together if you want to. You don't need my permission. :) (Although needing more than one level of nesting is often a sign you should rethink your row model.) > 3) Can you update the columns in the row of a supercolumn without > overwriting the whole row? Yes. > Similarly, > can you read a fraction of a SuperColumn without pulling the whole > thing to the client? Yes. -Jonathan
