Hi, Thank you very much for your useful info. I have one more doubt here. If i create one more column family based on my query instead of going with secondary index, Will it affect the write performance? Since i need to duplicate the data in the second column family as well while writing data, Will it hit write performance?
Thanks, Baskar.S On Tue, Aug 7, 2012 at 12:18 PM, Roshni Rajagopal < roshni.rajago...@wal-mart.com> wrote: > Hi Baskar, > > The key aspect here is, you have to think of your queries , and > denormalize. Here are my suggestions based on my understanding so far. > > You seem to have 2 queries > A) what all users do I have > B) what organizations do the users belong to > > The first can be a static column family- these are similar to RDBMS > 'master data' or 'dimensions' in the DWH world. > So you can have a users_CF column family where the row key is the primary > key- so you can have userid as primary key. For email id as primary key- > choose something which will never change (natural key vs surrogate key > debate). > > The second query is where the real power of the data model comes in. You > would not be having a separate organizations table with a foreign key to > the users table. > You would have a column family say Oraganizations_Users_CF with row key > corresponding to your 'where clause' needs- here organization name. And > then you can have a dynamic list of user names corresponding to each > organization as column names.One organization can have 3 users (3 cols) > another can have 10(10 cols) > Note it would automatically be sorted by username when you retrieve a row, > because comparator is Bytetype by default, which works for text sorting. > If you want some other sort criteria, like say last time logged in, keep > that as the column name, column value as username. Column names can also > store some useful information, like a value in itself. > Sorting is a design time decision. > > > I think there have been numerous posts advising against using secondary > indexes, so try to keep the key of the col family as what you would be > searching for, as far as possible. > > If you have a different query, you can create a new column family- its ok > to denormalize and have a separate column family per query. > > > Regards, > Roshni > > On 06/08/12 9:42 PM, "Alain RODRIGUEZ" <arodr...@gmail.com> wrote: > > >Cassandra modeling is well documented on the web and a bit too complex > >to be explained in one mail. > > > >I advice you reading a lot before you make modeling choices. > > > >You may start with these links : > > > > > http://www.datastax.com/docs/1.1/ddl/about-data-model#comparing-the-cassan > >dra-data-model-to-a-relational-database > > > http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cas > >sandra/ > > > >and this link seem interesting, but I haven't read it yet (about indexes) > >: > > > >http://www.anuff.com/2011/02/indexing-in-cassandra.html > > > >I hope you'll find your answers within this documentation. > > > >Alain > > > > > >2012/8/6 Baskar Sikkayan <baskar....@gmail.com>: > >> Hi, > >> Just wanted to learn Cassandra and trying to convert RDBMS design to > >> Canssandra. > >> Considered my app is being deployed in multiple Data centers. > >> > >> DB Design : > >> > >> A) CF : USER > >> 1) email_id - primary key > >> 2) fullname > >> 3) organization - ( I didnt create a separate table > >>for > >> organization ) > >> > >> B) CF : ORG_USER > >> > >> 1) organization - Primary Key > >> 2) email_id > >> > >> Here, my intention is to get users belong to an > >> organization. > >> Here, I can make the organization in the user table as > >> secondary index, but heard that, this may hit the performance. > >> Could you please clarify me which is the better > >>approach? > >> > >> > >> Thanks, > >> Baskar.S > > This email and any files transmitted with it are confidential and intended > solely for the individual or entity to whom they are addressed. If you have > received this email in error destroy it immediately. *** Walmart > Confidential *** >