Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html On Fri, Apr 4, 2014 at 11:34 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
Hello Shrikar, We are still facing read latency issue, here is the histogram http://pastebin.com/yEvMuHYh On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
Hi Apoorva, As per the cfhistogram there are some rows which have more than 75k columns and around 150k reads hit 2 SStables. Are you sure that you are seeing more than 500ms latency? The cfhistogram should the worst read performance was around 51ms which looks reasonable with many reads hitting

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
At the client side we are getting a latency of ~350ms, we are using datastax driver 2.0.0 and have kept the fetch size as 500. And these are coming while reading rows having ~200 columns. On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak shrika...@gmail.com wrote: Hi Apoorva, As per the

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
How about the client side socket limits? Cassandra client side maximum connection per host and read consistency level? ~Shrikar On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: At the client side we are getting a latency of ~350ms, we are using datastax driver

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
client side socket limit : 64K client side maximum connection per host : 8 read consistency level : Quorum On Thu, Apr 3, 2014 at 12:59 PM, Shrikar archak shrika...@gmail.com wrote: How about the client side socket limits? Cassandra client side maximum connection per host and read consistency

Re: Read performance in map data type

2014-04-03 Thread Robert Coli
On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: At the client side we are getting a latency of ~350ms, we are using datastax driver 2.0.0 and have kept the fetch size as 500. And these are coming while reading rows having ~200 columns. And you're sure that the

Re: Read performance in map data type

2014-04-02 Thread Apoorva Gaurav
I've observed that reducing fetch size results in better latency (isn't that obvious :-)), tried from fetch size varying from 100 to 1, seeing a lot of errors for 1. Haven't tried modifying the number of columns. Let me start a new thread focused on fetch size. On Wed, Apr 2, 2014 at

Re: Read performance in map data type

2014-04-01 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
Thanks Sourabh, I've modelled my table as studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature.

Re: Read performance in map data type

2014-04-01 Thread Sourabh Agrawal
From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we

Re: Read performance in map data type

2014-03-31 Thread Robert Coli
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for

Re: Read performance in map data type

2014-03-31 Thread Apoorva Gaurav
Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Yes primary key is (studentID,

Re: Read performance in map data type

2014-03-30 Thread Sourabh Agrawal
Hi, I don't think there is a problem with the driver. Regarding the schema, you may want to choose between wide rows and skinny rows. http://stackoverflow.com/questions/19039123/cassandra-wide-vs-skinny-rows-for-large-columns http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi Apoorva, Do you always query on studentID only or do you need to query on both studentID and subjectID? Also, I think using the latest driver (2.x) can make querying large number of rows efficient. http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 On Sat, Mar 29,

Re: Read performance in map data type

2014-03-29 Thread Apoorva Gaurav
Hello Sourabh, I'd prefer to do query like select * from marks_table where studentID = ? and subjectID in (?, ?, ??) but if its costly then can happily delegate the responsibility to the application layer. Haven't tried 2.x java driver for this specific issue but tried it once earlier and

Re: Read performance in map data type

2014-03-28 Thread Shrikar archak
Hi Apoorva, I assume this is the table with studentId and subjectId as primary keys and not other like like marks in that. create table marks_table(studentId int, subjectId int, marks int, PRIMARY KEY(studentId,subjectId)); Also could you give the cfhistogram stats? nodetool cfhistograms your

Re: Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for e.g. should I do select * form marks_table where studentID =