Hi all,
more than a years ago I wrote a comment for migrating an old schema to a new 
model.
Since the company had other priorities we didn't realize, and now I'm trying 
to upgrade 
my 0.6 data-model to the newest 2.0 model.

The DB contains mainly comments written by users on companies.
Comments must be validated (when they come into the application they are in 
"pending" status,
and then they can be "approved" or "rejected").

The main queries with very intensive use (and that should perform very fast) 
are:

1) Get all approved comments of a company sorted by insertion time
2) Get all approved comments of a user sorted by insertion time
3) Get latest X approved comments in city with a vote higher than Y sorted by 
insertion time 

User/Company comments are less than 100 in 90% of situations: in general when 
dealing with
user and company comments the amount of data is few kilobytes.
Comments in a city can be a more than 200.000 and is a fast-growing number.

In my old data model I had companies table, users table and comments table. 
The last containing the comments and 3 more
column families (company_comments/user_comments/city_comments) containing only 
a set of time-sorted uuid pointers to comments table. 

I have no idea in how many tables I should keep data in new model. I've been 
reading lots of
documentation: to make the model easier I though something like this ...

users and companies table like in the old model. As far as comments:

CREATE TABLE comments (
  location text,
  id timeuuid,
  status text,
  companyid uuid,
  userid uuid,
  text text,
  title text,
  vote varint,
  PRIMARY KEY ((location, status, vote), id)
) WITH CLUSTERING ORDER BY (id DESC);

create index companyid_key on commenti(companyid);
create index userid_key on commenti(userid);

This model should provide, out of the box, the query number 3. 

select * from comments where location='city' and status='approved' and vote in 
(3,4,5) order by id DESC limit X;

But the other 2 queries are made with secondary index and client-side 
intensive.

select * from comments where companyid='123';
select * from comments where userid='123';

And this will retrieve all company/user comments but they are

1 - not filtered by their status
2 - not sorted in any way

Considering the amount of data told before how would you model the platform?

Thanks for any help

Reply via email to