Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for e.g. should I do select * form marks_table where studentID = ? limit 500; instead of doing select * form marks_table where studentID = ?;
On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak <shrika...@gmail.com> wrote: > Hi Apoorva, > > I assume this is the table with studentId and subjectId as primary keys > and not other like like marks in that. > > create table marks_table(studentId int, subjectId int, marks int, PRIMARY > KEY(studentId,subjectId)); > > Also could you give the cfhistogram stats? > > nodetool cfhistograms <your keyspace> marks_table; > > > > Thanks, > Shrikar > > > On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav <apoorva.gau...@myntra.com > > wrote: > >> Hello All, >> >> We've a schema which can be modeled as (studentID, subjectID, marks) >> where combination of studentID and subjectID is unique. Number of studentID >> can go up to 100 million and for each studentID we can have up to 10k >> subjectIDs. >> >> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We >> are using a four node cluster, each having 24 cores and 32GB memory. I'm >> sure that the machines are not underperformant as on same test bed we've >> consistently received <5ms response times for ~1b documents when queried >> via primary key. >> >> I've tried three approaches, all of which result in significant >> deterioration (>500 ms response time) in read query performance once number >> of subjectIDs goes past ~100 for a studentID. Approaches are :- >> >> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int, >> int>) and query by subjectID >> >> 2. model as (studentID int, subjectID int, marks int, PRIMARY >> KEY(studentID, subjectID) and query as select * from marks_table where >> studentID = ? >> >> 3. model as (studentID int, subjectID int, marks int, PRIMARY >> KEY(studentID, subjectID) and query as select * from marks_table where >> studentID = ? and subjectID in (?, ?, ?....?) number of subjectIDs in >> query being ~1K. >> >> What can be the bottlenecks. Is it better if we model as (studentID int, >> subjct_marks_json text) and query by studentID. >> >> -- >> Thanks & Regards, >> Apoorva >> > > -- Thanks & Regards, Apoorva