RE: Example Data Modelling

Peer, Oded Mon, 06 Jul 2015 23:04:15 -0700

The data model suggested isn’t optimal for the “end of month” query you want to 
run since you are not querying by partition key.
The query would look like “select EmpID, FN, LN, basic from salaries where 
month = 1” which requires filtering and has unpredictable performance.


For this type of query to be fast you can use the “month” column as the 
partition key and the “EmpID” and the clustering column.
This approach also has drawbacks:
1. This data model creates a wide row. Depending on the number of employees 
this partition might be very large. You should limit partition sizes to 25MB
2. Distributing data according to month means that only a small number of nodes 
will hold all of the salary data for a specific month which might cause 
hotspots on those nodes.

Choose the approach that works best for you.


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: Monday, July 06, 2015 7:04 PM
To: user@cassandra.apache.org
Subject: Re: Example Data Modelling

Hi Srinivasa,

I think you're right, In Cassandra you should favor denormalisation when in 
RDBMS you find a relationship like this.

I'd suggest a cf like this
CREATE TABLE salaries (
  EmpID varchar,
  FN varchar,
  LN varchar,
  Phone varchar,
  Address varchar,
  month integer,
  basic integer,
  flexible_allowance float,
  PRIMARY KEY(EmpID, month)
)

That way the salaries will be partitioned by EmpID and clustered by month, 
which I guess is the natural sorting you want.

Hope it helps,
Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 6 July 2015 at 13:01, Srinivasa T N 
<seen...@gmail.com<mailto:seen...@gmail.com>> wrote:
Hi,
   I have basic doubt: I have an RDBMS with the following two tables:

   Emp - EmpID, FN, LN, Phone, Address
   Sal - Month, Empid, Basic, Flexible Allowance

   My use case is to print the Salary slip at the end of each month and the 
slip contains emp name and his other details.

   Now, if I want to have the same in cassandra, I will have a single cf with 
emp personal details and his salary details.  Is this the right approach?  
Should we have the employee personal details duplicated each month?

Regards,
Seenu.

RE: Example Data Modelling

Reply via email to