Hi Carlos, Columns in primary key like EmpID can't be static. But remind that EmpID as in the partition key is not duplicated.
-- Jérôme Mainaud jer...@mainaud.com 2015-07-07 16:02 GMT+02:00 Carlos Alonso <i...@mrcalonso.com>: > Hi Jerome, > > Good point!! Really a nice usage of static columns! BTW, wouldn't the > EmpID be static as well? > > Cheers > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 7 July 2015 at 14:42, Jérôme Mainaud <jer...@mainaud.com> wrote: > >> Hello, >> >> You can slightly adapt Carlos answer to reduce repliation of data that >> don't change for month to month. >> Static columns are great for this. >> >> The table become: >> >> CREATE TABLE salaries ( >> EmpID varchar, >> FN varchar *static*, >> LN varchar *static*, >> Phone varchar *static*, >> Address varchar *static*, >> month integer, >> basic integer, >> flexible_allowance float, >> PRIMARY KEY(EmpID, month) >> ) >> >> There is only one copy of static column per partition the value is shared >> between all rows of the partition. >> When Employee data change you can update it with the partition key in the >> where clause. >> When you insert a new month entry you just fill non static columns. >> The table can be queried the same way as the original one. >> >> Cheers >> >> >> >> -- >> Jérôme Mainaud >> jer...@mainaud.com >> >> 2015-07-07 11:51 GMT+02:00 Rory Bramwell, DevOp Services < >> rory.bramw...@devopservices.com>: >> >>> Hi, >>> >>> I've been following this thread and my thoughts are inline with Carlos' >>> latest response... Model your data to suite your queries. That is one of >>> the data model / design considerations in Cassandra that differs from the >>> RDBMS world. Embrace demoralization and data duplication. Disk space is >>> cheapest, so exploit how your data is laid out in order to optimize for >>> faster reads (which are more costly than writes). >>> >>> Regards, >>> >>> Rory Bramwell >>> Founder and CEO >>> DevOp Services >>> >>> Skype: devopservices >>> Email: rory.bramw...@devopservices.com >>> Web: www.devopservices.com >>> On Jul 7, 2015 4:02 AM, "Carlos Alonso" <i...@mrcalonso.com> wrote: >>> >>>> I guess you're right, using my proposal, getting last employee's record >>>> is straightforward and quick, but also, as Peter pointed, getting all slips >>>> for a particular month requires you to know all the employee IDs and, >>>> ideally, run a query for each employee. This would work depending on how >>>> many employees you're managing. >>>> >>>> At this moment I'm beginning to feel that maybe using both approaches >>>> is the best way to go. And I think this is one of Cassandra's >>>> recommendations: Write your data in several formats if required to fit your >>>> reads. Therefore I'd use my suggestion for getting a salary by employee ID >>>> and I'd also have Peter's one to run the end of the month query. >>>> Does it make sense? >>>> >>>> Cheers! >>>> >>>> Carlos Alonso | Software Engineer | @calonso >>>> <https://twitter.com/calonso> >>>> >>>> On 7 July 2015 at 09:07, Srinivasa T N <seen...@gmail.com> wrote: >>>> >>>>> Thanks for the inputs. >>>>> >>>>> Now my question is how should the app populate the duplicate data, >>>>> i.e., if I have an employee record (along with his FN, LN,..) for the >>>>> month >>>>> of Apr and later I am populating the same record for the month of may >>>>> (with >>>>> salary changed), should my application first read/fetch the corresponding >>>>> data for apr and re-insert with modification for month of may? >>>>> >>>>> Regards, >>>>> Seenu. >>>>> >>>>> On Tue, Jul 7, 2015 at 11:32 AM, Peer, Oded <oded.p...@rsa.com> wrote: >>>>> >>>>>> The data model suggested isn’t optimal for the “end of month” query >>>>>> you want to run since you are not querying by partition key. >>>>>> >>>>>> The query would look like “select EmpID, FN, LN, basic from salaries >>>>>> where month = 1” which requires filtering and has unpredictable >>>>>> performance. >>>>>> >>>>>> >>>>>> >>>>>> For this type of query to be fast you can use the “month” column as >>>>>> the partition key and the “EmpID” and the clustering column. >>>>>> >>>>>> This approach also has drawbacks: >>>>>> >>>>>> 1. This data model creates a wide row. Depending on the number of >>>>>> employees this partition might be very large. You should limit partition >>>>>> sizes to 25MB >>>>>> >>>>>> 2. Distributing data according to month means that only a small >>>>>> number of nodes will hold all of the salary data for a specific month >>>>>> which >>>>>> might cause hotspots on those nodes. >>>>>> >>>>>> >>>>>> >>>>>> Choose the approach that works best for you. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From:* Carlos Alonso [mailto:i...@mrcalonso.com] >>>>>> *Sent:* Monday, July 06, 2015 7:04 PM >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* Re: Example Data Modelling >>>>>> >>>>>> >>>>>> >>>>>> Hi Srinivasa, >>>>>> >>>>>> >>>>>> >>>>>> I think you're right, In Cassandra you should favor denormalisation >>>>>> when in RDBMS you find a relationship like this. >>>>>> >>>>>> >>>>>> >>>>>> I'd suggest a cf like this >>>>>> >>>>>> CREATE TABLE salaries ( >>>>>> >>>>>> EmpID varchar, >>>>>> >>>>>> FN varchar, >>>>>> >>>>>> LN varchar, >>>>>> >>>>>> Phone varchar, >>>>>> >>>>>> Address varchar, >>>>>> >>>>>> month integer, >>>>>> >>>>>> basic integer, >>>>>> >>>>>> flexible_allowance float, >>>>>> >>>>>> PRIMARY KEY(EmpID, month) >>>>>> >>>>>> ) >>>>>> >>>>>> >>>>>> >>>>>> That way the salaries will be partitioned by EmpID and clustered by >>>>>> month, which I guess is the natural sorting you want. >>>>>> >>>>>> >>>>>> >>>>>> Hope it helps, >>>>>> >>>>>> Cheers! >>>>>> >>>>>> >>>>>> Carlos Alonso | Software Engineer | @calonso >>>>>> <https://twitter.com/calonso> >>>>>> >>>>>> >>>>>> >>>>>> On 6 July 2015 at 13:01, Srinivasa T N <seen...@gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have basic doubt: I have an RDBMS with the following two tables: >>>>>> >>>>>> Emp - EmpID, FN, LN, Phone, Address >>>>>> Sal - Month, Empid, Basic, Flexible Allowance >>>>>> >>>>>> My use case is to print the Salary slip at the end of each month >>>>>> and the slip contains emp name and his other details. >>>>>> >>>>>> Now, if I want to have the same in cassandra, I will have a single >>>>>> cf with emp personal details and his salary details. Is this the right >>>>>> approach? Should we have the employee personal details duplicated each >>>>>> month? >>>>>> >>>>>> Regards, >>>>>> Seenu. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >> >