date:20150318

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta

The rendering tool renders a portion a very large image. It may fetch
different data each time from billions of rows.
So I don't think I can cache such large results. Since same results will
rarely fetched again.

Also do you know how I can do 2d range queries using Cassandra. Some other
users suggested me using Solr.
But is there any way I can achieve that without using any other technology.

On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

Sorry, meant to say that way when you have to render, you can just
display the latest cache.

On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com wrote:

I would probably do this in a background thread and cache the results,
that way when you have to render, you can just cache the latest results.

I don't know why Cassandra can't seem to be able to fetch large batch
sizes, I've also run into these timeouts but reducing the batch size to 2k
seemed to work for me.

On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

We have UI interface which needs this data for rendering.
So efficiency of pulling this data matters a lot. It should be fetched
within a minute.
Is there a way to achieve such efficiency

On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
wrote:

Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
seems like the difference would only be a few minutes. Do you have to do
this all the time, or only once in a while?

On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta
meme...@cs.stonybrook.edu wrote:

yes it works for 1000 but not more than that.
How can I fetch all rows using this efficiently?

On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
wrote:

Have you tried a smaller fetch size, such as 5k - 2k ?

On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta
meme...@cs.stonybrook.edu wrote:

Hi Jens,

I have tried with fetch size of 1 still its not giving any
results.
My expectations were that Cassandra can handle a million rows
easily.

Is there any mistake in the way I am defining the keys or querying
them.

Thanks
Mehak

On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
wrote:

Hi,

Try setting fetchsize before querying. Assuming you don't set it
too high, and you don't have too many tombstones, that should do it.

Cheers,
Jens

–
Skickat från Mailbox https://www.dropbox.com/mailbox

On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta
meme...@cs.stonybrook.edu wrote:

Hi,

I have requirement to fetch million row as result of my query
which is giving timeout errors.
I am fetching results by selecting clustering columns, then why
the queries are taking so long. I can change the timeout settings but
I
need the data to fetched faster as per my requirement.

My table definition is:
*CREATE TABLE images.results (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc
varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid
varchar,
image_mpp_x double, image_mpp_y double, image_width double,
image_height
double, objective double, cancer_type varchar, Area float,
submit_date
timestamp, points listdouble, PRIMARY KEY
((image_caseid),Area,uuid));*

Here each row is uniquely identified on the basis of unique uuid.
But since my data is generally queried based upon *image_caseid *I
have made it partition key.
I am currently using Java Datastax api to fetch the results. But
the query is taking a lot of time resulting in timeout errors:

Exception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.DriverException: Timed out
waiting for
server response))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
at
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at QueryDB.queryArea(TestQuery.java:59)
at TestQuery.main(TestQuery.java:35)
Caused by:
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.DriverException: Timed out
waiting for
server response))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
at
com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta

Data won't change much but queries will be different.
I am not working on the rendering tool myself so I don't know much details
about it.

Also as suggested by you I tried to fetch data in size of 500 or 1000 with
java driver auto pagination.
It fails when the number of records are high (around 10) with following
error:

Exception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
server response))


On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing more
 details (how often the data is changing, what you're doing with the 1m rows
 after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch / solr ,
 they are more suited for the kind of analytics you seem to be doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you made
 your range columns part of the primary key, so something like PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 The rendering tool renders a portion a very large image. It may fetch
 different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results will
 rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra. Some
 other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Sorry, meant to say that way when you have to render, you can just
 display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the results,
 that way when you have to render, you can just cache the latest results.

 I don't know why Cassandra can't seem to be able to fetch large batch
 sizes, I've also run into these timeouts but reducing the batch size to 2k
 seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
 seems like the difference would only be a few minutes. Do you have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows
 easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it
 too high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query
 which is giving timeout errors.
 I am fetching results by selecting clustering columns, then why
 the queries are taking so long. I can change the timeout settings 
 but I
 need the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost 
 varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, 
 image_height
 double, objective double, cancer_type varchar,  Area float, 
 submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique
 uuid. But since my data is generally queried based upon 
 *image_caseid
 *I have made it partition key.
 I am currently using Java Datastax api to fetch the results. But
 the

Re: schema generation in cassandra

2015-03-18 Thread Ali Akhtar

Why are you creating new tables dynamically? I would try to use a static
schema and use a collection (list / map / set) for storing arbitrary data.

On Wed, Mar 18, 2015 at 2:52 PM, Ankit Agarwal agarwalankit.k...@gmail.com
wrote:

 Hi,

 I am new to Cassandra, we are planning to use Cassandra for cloud base
 application in our development environment, so for this looking forward for
 best strategies to sync the schema for micro-services while deploy
 application on cloud foundry

 One way which I could use is   Accessor interface with datastax-mapper and
 casandra-core driver.



 1.)  I have created a keyspace using  core driver which will be generated
 on initialization of servlet

 *public* *void* init() *throws* ServletException

 {
 Cluster cluster = Cluster.*builder*().addContactPoint(127.0.0.1
 ).build();

 Session session = cluster.connect();

  String keySpace=sampletest;

 session.execute(CREATE KEYSPACE IF NOT EXISTS + keySpace +

  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' :
 1 });

 ..


 2.) This is my Accessor Interface which I used to generate Query
 for creating column family ..

 @Accessor

 *public* *interface* UserAccessor

 {

 @Query( CREATE TABLE sampletest.emp (id uuid PRIMARY KEY,name
 text,department text,location text,phone bigint ) WITH caching = '{
 \keys\ : \ALL\ , \rows_per_partition\ : \NONE\ }'  )

 ResultSet create_table();

 }




 3.) Creating  an  instance of Accessor interface to provide mapping  to
 our query to generate column family

 

 MappingManager mapper=*new* MappingManager(session);

 UserAccessor ua= mapper.createAccessor(UserAccessor.*class*);

  ua.create_table();

  



 4.) So far I have created a keyspace with a column family , now I want to
 map my data using POJO class mentioned below


 @Table(keyspace = sampletest, name = emp)
 *public* *class* Employee {

  @PartitionKey

 *private* UUID id;

 *private* String name;

 *private* String department;

 *private* String location;

 *private* Long phone;

 // getter setter method
 .
 }



 Is there any other better approach to achieve this especially for cloud
 environment


 --
 Thanks

 Ankit Agarwal

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar

Yeah, it may be that the process is being limited by swap. This page:

https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42

Lines 42 - 48 list a few settings that you could try out for increasing /
reducing the memory limits (assuming you're on linux).

Also, are you using an SSD? If so make sure the IO scheduler is noop or
deadline .

On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Currently Cassandra java process is taking 1% of cpu (total 8% is being
 used) and 14.3% memory (out of total 4G memory).
 As you can see there is not much load from other processes.

 Should I try changing default parameters of memory in Cassandra settings.

 On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 What's your memory / CPU usage at? And how much ram + cpu do you have on
 this server?



 On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently there is only single node which I am calling directly with
 around 15 rows. Full data will be in around billions per node.
 The code is working only for size 100/200. Also the consecutive fetching
 is taking around 5-10 secs.

 I have a parallel script which is inserting the data while I am reading
 it. When I stopped the script it worked for 500/1000 but not more than
 that.



 On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

  If even 500-1000 isn't working, then your cassandra node might not be
 up.

 1) Try running nodetool status from shell on your cassandra server,
 make sure the nodes are up.

 2) Are you calling this on the same server where cassandra is running?
 Its trying to connect to localhost . If you're running it on a different
 server, try passing in the direct ip of your cassandra server.

 On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Data won't change much but queries will be different.
 I am not working on the rendering tool myself so I don't know much
 details about it.

 Also as suggested by you I tried to fetch data in size of 500 or 1000
 with java driver auto pagination.
 It fails when the number of records are high (around 10) with
 following error:

 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))


 On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing
 more details (how often the data is changing, what you're doing with the 
 1m
 rows after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch /
 solr , they are more suited for the kind of analytics you seem to be 
 doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you
 made your range columns part of the primary key, so something like 
 PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 The rendering tool renders a portion a very large image. It may
 fetch different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results
 will rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra. Some
 other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Sorry, meant to say that way when you have to render, you can
 just display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the
 results, that way when you have to render, you can just cache the 
 latest
 results.

 I don't know why Cassandra can't seem to be able to fetch large
 batch sizes, I've also run into these timeouts but reducing the batch 
 size
 to 2k seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be
 fetched within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
  wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows,
 it seems like the difference would only be a few minutes. Do you 
 have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar

Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
like the difference would only be a few minutes. Do you have to do this all
the time, or only once in a while?

On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 
 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak

schema generation in cassandra

2015-03-18 Thread Ankit Agarwal

Hi,

I am new to Cassandra, we are planning to use Cassandra for cloud base
application in our development environment, so for this looking forward for
best strategies to sync the schema for micro-services while deploy
application on cloud foundry

One way which I could use is   Accessor interface with datastax-mapper and
casandra-core driver.



1.)  I have created a keyspace using  core driver which will be generated
on initialization of servlet

*public* *void* init() *throws* ServletException

{
Cluster cluster = Cluster.*builder*().addContactPoint(127.0.0.1).build();

Session session = cluster.connect();

 String keySpace=sampletest;

session.execute(CREATE KEYSPACE IF NOT EXISTS + keySpace +

 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1
});

..


2.) This is my Accessor Interface which I used to generate Query
for creating column family ..

@Accessor

*public* *interface* UserAccessor

{

@Query( CREATE TABLE sampletest.emp (id uuid PRIMARY KEY,name
text,department text,location text,phone bigint ) WITH caching = '{
\keys\ : \ALL\ , \rows_per_partition\ : \NONE\ }'  )

ResultSet create_table();

}




3.) Creating  an  instance of Accessor interface to provide mapping  to our
query to generate column family



MappingManager mapper=*new* MappingManager(session);

UserAccessor ua= mapper.createAccessor(UserAccessor.*class*);

 ua.create_table();

 



4.) So far I have created a keyspace with a column family , now I want to
map my data using POJO class mentioned below


@Table(keyspace = sampletest, name = emp)
*public* *class* Employee {

 @PartitionKey

*private* UUID id;

*private* String name;

*private* String department;

*private* String location;

*private* Long phone;

// getter setter method
.
}



Is there any other better approach to achieve this especially for cloud
environment


-- 
Thanks

Ankit Agarwal

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar

4g also seems small for the kind of load you are trying to handle (billions
of rows) etc.

I would also try adding more nodes to the cluster.

On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Yeah, it may be that the process is being limited by swap. This page:


 https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42

 Lines 42 - 48 list a few settings that you could try out for increasing /
 reducing the memory limits (assuming you're on linux).

 Also, are you using an SSD? If so make sure the IO scheduler is noop or
 deadline .

 On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently Cassandra java process is taking 1% of cpu (total 8% is being
 used) and 14.3% memory (out of total 4G memory).
 As you can see there is not much load from other processes.

 Should I try changing default parameters of memory in Cassandra settings.

 On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 What's your memory / CPU usage at? And how much ram + cpu do you have on
 this server?



 On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently there is only single node which I am calling directly with
 around 15 rows. Full data will be in around billions per node.
 The code is working only for size 100/200. Also the consecutive
 fetching is taking around 5-10 secs.

 I have a parallel script which is inserting the data while I am reading
 it. When I stopped the script it worked for 500/1000 but not more than
 that.



 On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

  If even 500-1000 isn't working, then your cassandra node might not be
 up.

 1) Try running nodetool status from shell on your cassandra server,
 make sure the nodes are up.

 2) Are you calling this on the same server where cassandra is running?
 Its trying to connect to localhost . If you're running it on a different
 server, try passing in the direct ip of your cassandra server.

 On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Data won't change much but queries will be different.
 I am not working on the rendering tool myself so I don't know much
 details about it.

 Also as suggested by you I tried to fetch data in size of 500 or 1000
 with java driver auto pagination.
 It fails when the number of records are high (around 10) with
 following error:

 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))


 On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing
 more details (how often the data is changing, what you're doing with 
 the 1m
 rows after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch /
 solr , they are more suited for the kind of analytics you seem to be 
 doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you
 made your range columns part of the primary key, so something like 
 PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 The rendering tool renders a portion a very large image. It may
 fetch different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results
 will rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra.
 Some other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Sorry, meant to say that way when you have to render, you can
 just display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the
 results, that way when you have to render, you can just cache the 
 latest
 results.

 I don't know why Cassandra can't seem to be able to fetch large
 batch sizes, I've also run into these timeouts but reducing the 
 batch size
 to 2k seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be
 fetched within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar 
 ali.rac...@gmail.com wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m
 rows, it seems like the difference would only be a few minutes. Do 
 you have
 to

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar

How often does the data change?

I would still recommend a caching of some kind, but without knowing more
details (how often the data is changing, what you're doing with the 1m rows
after getting them, etc) I can't recommend a solution.

I did see your other thread. I would also vote for elasticsearch / solr ,
they are more suited for the kind of analytics you seem to be doing.
Cassandra is more for storing data, it isn't all that great for complex
queries / analytics.

If you want to stick to cassandra, you might have better luck if you made
your range columns part of the primary key, so something like PRIMARY
KEY(caseId, x, y)

On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote: