25 MB seems very specific. Is there a reason why?

On Tuesday, July 7, 2015, Peer, Oded <oded.p...@rsa.com> wrote:

>  The data model suggested isn’t optimal for the “end of month” query you
> want to run since you are not querying by partition key.
>
> The query would look like “select EmpID, FN, LN, basic from salaries where
> month = 1” which requires filtering and has unpredictable performance.
>
>
>
> For this type of query to be fast you can use the “month” column as the
> partition key and the “EmpID” and the clustering column.
>
> This approach also has drawbacks:
>
> 1. This data model creates a wide row. Depending on the number of
> employees this partition might be very large. You should limit partition
> sizes to 25MB
>
> 2. Distributing data according to month means that only a small number of
> nodes will hold all of the salary data for a specific month which might
> cause hotspots on those nodes.
>
>
>
> Choose the approach that works best for you.
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com
> <javascript:_e(%7B%7D,'cvml','i...@mrcalonso.com');>]
> *Sent:* Monday, July 06, 2015 7:04 PM
> *To:* user@cassandra.apache.org
> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>
> *Subject:* Re: Example Data Modelling
>
>
>
> Hi Srinivasa,
>
>
>
> I think you're right, In Cassandra you should favor denormalisation when
> in RDBMS you find a relationship like this.
>
>
>
> I'd suggest a cf like this
>
> CREATE TABLE salaries (
>
>   EmpID varchar,
>
>   FN varchar,
>
>   LN varchar,
>
>   Phone varchar,
>
>   Address varchar,
>
>   month integer,
>
>   basic integer,
>
>   flexible_allowance float,
>
>   PRIMARY KEY(EmpID, month)
>
> )
>
>
>
> That way the salaries will be partitioned by EmpID and clustered by month,
> which I guess is the natural sorting you want.
>
>
>
> Hope it helps,
>
> Cheers!
>
>
>   Carlos Alonso | Software Engineer | @calonso
> <https://twitter.com/calonso>
>
>
>
> On 6 July 2015 at 13:01, Srinivasa T N <seen...@gmail.com
> <javascript:_e(%7B%7D,'cvml','seen...@gmail.com');>> wrote:
>
> Hi,
>
>    I have basic doubt: I have an RDBMS with the following two tables:
>
>    Emp - EmpID, FN, LN, Phone, Address
>    Sal - Month, Empid, Basic, Flexible Allowance
>
>    My use case is to print the Salary slip at the end of each month and
> the slip contains emp name and his other details.
>
>    Now, if I want to have the same in cassandra, I will have a single cf
> with emp personal details and his salary details.  Is this the right
> approach?  Should we have the employee personal details duplicated each
> month?
>
> Regards,
> Seenu.
>
>
>


-- 

- John

Reply via email to