Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
The rendering tool renders a portion a very large image. It may fetch
different data each time from billions of rows.
So I don't think I can cache such large results. Since same results will
rarely fetched again.

Also do you know how I can do 2d range queries using Cassandra. Some other
users suggested me using Solr.
But is there any way I can achieve that without using any other technology.

On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Sorry, meant to say that way when you have to render, you can just
 display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 I would probably do this in a background thread and cache the results,
 that way when you have to render, you can just cache the latest results.

 I don't know why Cassandra can't seem to be able to fetch large batch
 sizes, I've also run into these timeouts but reducing the batch size to 2k
 seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
 seems like the difference would only be a few minutes. Do you have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows
 easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it
 too high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query
 which is giving timeout errors.
 I am fetching results by selecting clustering columns, then why
 the queries are taking so long. I can change the timeout settings but 
 I
 need the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, 
 image_height
 double, objective double, cancer_type varchar,  Area float, 
 submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid.
 But since my data is generally queried based upon *image_caseid *I
 have made it partition key.
 I am currently using Java Datastax api to fetch the results. But
 the query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out 
 waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out 
 waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at 

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
Data won't change much but queries will be different.
I am not working on the rendering tool myself so I don't know much details
about it.

Also as suggested by you I tried to fetch data in size of 500 or 1000 with
java driver auto pagination.
It fails when the number of records are high (around 10) with following
error:

Exception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
server response))


On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing more
 details (how often the data is changing, what you're doing with the 1m rows
 after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch / solr ,
 they are more suited for the kind of analytics you seem to be doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you made
 your range columns part of the primary key, so something like PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 The rendering tool renders a portion a very large image. It may fetch
 different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results will
 rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra. Some
 other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Sorry, meant to say that way when you have to render, you can just
 display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the results,
 that way when you have to render, you can just cache the latest results.

 I don't know why Cassandra can't seem to be able to fetch large batch
 sizes, I've also run into these timeouts but reducing the batch size to 2k
 seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
 seems like the difference would only be a few minutes. Do you have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows
 easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it
 too high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query
 which is giving timeout errors.
 I am fetching results by selecting clustering columns, then why
 the queries are taking so long. I can change the timeout settings 
 but I
 need the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost 
 varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, 
 image_height
 double, objective double, cancer_type varchar,  Area float, 
 submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique
 uuid. But since my data is generally queried based upon 
 *image_caseid
 *I have made it partition key.
 I am currently using Java Datastax api to fetch the results. But
 the 

Re: schema generation in cassandra

2015-03-18 Thread Ali Akhtar
Why are you creating new tables dynamically? I would try to use a static
schema and use a collection (list / map / set) for storing arbitrary data.

On Wed, Mar 18, 2015 at 2:52 PM, Ankit Agarwal agarwalankit.k...@gmail.com
wrote:

 Hi,

 I am new to Cassandra, we are planning to use Cassandra for cloud base
 application in our development environment, so for this looking forward for
 best strategies to sync the schema for micro-services while deploy
 application on cloud foundry

 One way which I could use is   Accessor interface with datastax-mapper and
 casandra-core driver.



 1.)  I have created a keyspace using  core driver which will be generated
 on initialization of servlet

 *public* *void* init() *throws* ServletException

 {
 Cluster cluster = Cluster.*builder*().addContactPoint(127.0.0.1
 ).build();

 Session session = cluster.connect();

  String keySpace=sampletest;

 session.execute(CREATE KEYSPACE IF NOT EXISTS + keySpace +

  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' :
 1 });

 ..


 2.) This is my Accessor Interface which I used to generate Query
 for creating column family ..

 @Accessor

 *public* *interface* UserAccessor

 {

 @Query( CREATE TABLE sampletest.emp (id uuid PRIMARY KEY,name
 text,department text,location text,phone bigint ) WITH caching = '{
 \keys\ : \ALL\ , \rows_per_partition\ : \NONE\ }'  )

 ResultSet create_table();

 }




 3.) Creating  an  instance of Accessor interface to provide mapping  to
 our query to generate column family

 

 MappingManager mapper=*new* MappingManager(session);

 UserAccessor ua= mapper.createAccessor(UserAccessor.*class*);

  ua.create_table();

  



 4.) So far I have created a keyspace with a column family , now I want to
 map my data using POJO class mentioned below


 @Table(keyspace = sampletest, name = emp)
 *public* *class* Employee {

  @PartitionKey

 *private* UUID id;

 *private* String name;

 *private* String department;

 *private* String location;

 *private* Long phone;

 // getter setter method
 .
 }



 Is there any other better approach to achieve this especially for cloud
 environment


 --
 Thanks

 Ankit Agarwal




Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
Yeah, it may be that the process is being limited by swap. This page:

https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42

Lines 42 - 48 list a few settings that you could try out for increasing /
reducing the memory limits (assuming you're on linux).

Also, are you using an SSD? If so make sure the IO scheduler is noop or
deadline .

On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Currently Cassandra java process is taking 1% of cpu (total 8% is being
 used) and 14.3% memory (out of total 4G memory).
 As you can see there is not much load from other processes.

 Should I try changing default parameters of memory in Cassandra settings.

 On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 What's your memory / CPU usage at? And how much ram + cpu do you have on
 this server?



 On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently there is only single node which I am calling directly with
 around 15 rows. Full data will be in around billions per node.
 The code is working only for size 100/200. Also the consecutive fetching
 is taking around 5-10 secs.

 I have a parallel script which is inserting the data while I am reading
 it. When I stopped the script it worked for 500/1000 but not more than
 that.



 On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

  If even 500-1000 isn't working, then your cassandra node might not be
 up.

 1) Try running nodetool status from shell on your cassandra server,
 make sure the nodes are up.

 2) Are you calling this on the same server where cassandra is running?
 Its trying to connect to localhost . If you're running it on a different
 server, try passing in the direct ip of your cassandra server.

 On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Data won't change much but queries will be different.
 I am not working on the rendering tool myself so I don't know much
 details about it.

 Also as suggested by you I tried to fetch data in size of 500 or 1000
 with java driver auto pagination.
 It fails when the number of records are high (around 10) with
 following error:

 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))


 On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing
 more details (how often the data is changing, what you're doing with the 
 1m
 rows after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch /
 solr , they are more suited for the kind of analytics you seem to be 
 doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you
 made your range columns part of the primary key, so something like 
 PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 The rendering tool renders a portion a very large image. It may
 fetch different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results
 will rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra. Some
 other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Sorry, meant to say that way when you have to render, you can
 just display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the
 results, that way when you have to render, you can just cache the 
 latest
 results.

 I don't know why Cassandra can't seem to be able to fetch large
 batch sizes, I've also run into these timeouts but reducing the batch 
 size
 to 2k seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be
 fetched within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
  wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows,
 it seems like the difference would only be a few minutes. Do you 
 have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this 

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
like the difference would only be a few minutes. Do you have to do this all
the time, or only once in a while?

On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 
 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak












schema generation in cassandra

2015-03-18 Thread Ankit Agarwal
Hi,

I am new to Cassandra, we are planning to use Cassandra for cloud base
application in our development environment, so for this looking forward for
best strategies to sync the schema for micro-services while deploy
application on cloud foundry

One way which I could use is   Accessor interface with datastax-mapper and
casandra-core driver.



1.)  I have created a keyspace using  core driver which will be generated
on initialization of servlet

*public* *void* init() *throws* ServletException

{
Cluster cluster = Cluster.*builder*().addContactPoint(127.0.0.1).build();

Session session = cluster.connect();

 String keySpace=sampletest;

session.execute(CREATE KEYSPACE IF NOT EXISTS + keySpace +

 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1
});

..


2.) This is my Accessor Interface which I used to generate Query
for creating column family ..

@Accessor

*public* *interface* UserAccessor

{

@Query( CREATE TABLE sampletest.emp (id uuid PRIMARY KEY,name
text,department text,location text,phone bigint ) WITH caching = '{
\keys\ : \ALL\ , \rows_per_partition\ : \NONE\ }'  )

ResultSet create_table();

}




3.) Creating  an  instance of Accessor interface to provide mapping  to our
query to generate column family



MappingManager mapper=*new* MappingManager(session);

UserAccessor ua= mapper.createAccessor(UserAccessor.*class*);

 ua.create_table();

 



4.) So far I have created a keyspace with a column family , now I want to
map my data using POJO class mentioned below


@Table(keyspace = sampletest, name = emp)
*public* *class* Employee {

 @PartitionKey

*private* UUID id;

*private* String name;

*private* String department;

*private* String location;

*private* Long phone;

// getter setter method
.
}



Is there any other better approach to achieve this especially for cloud
environment


-- 
Thanks

Ankit Agarwal


Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
4g also seems small for the kind of load you are trying to handle (billions
of rows) etc.

I would also try adding more nodes to the cluster.

On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Yeah, it may be that the process is being limited by swap. This page:


 https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42

 Lines 42 - 48 list a few settings that you could try out for increasing /
 reducing the memory limits (assuming you're on linux).

 Also, are you using an SSD? If so make sure the IO scheduler is noop or
 deadline .

 On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently Cassandra java process is taking 1% of cpu (total 8% is being
 used) and 14.3% memory (out of total 4G memory).
 As you can see there is not much load from other processes.

 Should I try changing default parameters of memory in Cassandra settings.

 On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 What's your memory / CPU usage at? And how much ram + cpu do you have on
 this server?



 On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently there is only single node which I am calling directly with
 around 15 rows. Full data will be in around billions per node.
 The code is working only for size 100/200. Also the consecutive
 fetching is taking around 5-10 secs.

 I have a parallel script which is inserting the data while I am reading
 it. When I stopped the script it worked for 500/1000 but not more than
 that.



 On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

  If even 500-1000 isn't working, then your cassandra node might not be
 up.

 1) Try running nodetool status from shell on your cassandra server,
 make sure the nodes are up.

 2) Are you calling this on the same server where cassandra is running?
 Its trying to connect to localhost . If you're running it on a different
 server, try passing in the direct ip of your cassandra server.

 On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Data won't change much but queries will be different.
 I am not working on the rendering tool myself so I don't know much
 details about it.

 Also as suggested by you I tried to fetch data in size of 500 or 1000
 with java driver auto pagination.
 It fails when the number of records are high (around 10) with
 following error:

 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))


 On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing
 more details (how often the data is changing, what you're doing with 
 the 1m
 rows after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch /
 solr , they are more suited for the kind of analytics you seem to be 
 doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if you
 made your range columns part of the primary key, so something like 
 PRIMARY
 KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 The rendering tool renders a portion a very large image. It may
 fetch different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results
 will rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra.
 Some other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Sorry, meant to say that way when you have to render, you can
 just display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 I would probably do this in a background thread and cache the
 results, that way when you have to render, you can just cache the 
 latest
 results.

 I don't know why Cassandra can't seem to be able to fetch large
 batch sizes, I've also run into these timeouts but reducing the 
 batch size
 to 2k seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be
 fetched within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar 
 ali.rac...@gmail.com wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m
 rows, it seems like the difference would only be a few minutes. Do 
 you have
 to 

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
How often does the data change?

I would still recommend a caching of some kind, but without knowing more
details (how often the data is changing, what you're doing with the 1m rows
after getting them, etc) I can't recommend a solution.

I did see your other thread. I would also vote for elasticsearch / solr ,
they are more suited for the kind of analytics you seem to be doing.
Cassandra is more for storing data, it isn't all that great for complex
queries / analytics.

If you want to stick to cassandra, you might have better luck if you made
your range columns part of the primary key, so something like PRIMARY
KEY(caseId, x, y)

On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 The rendering tool renders a portion a very large image. It may fetch
 different data each time from billions of rows.
 So I don't think I can cache such large results. Since same results will
 rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra. Some other
 users suggested me using Solr.
 But is there any way I can achieve that without using any other technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Sorry, meant to say that way when you have to render, you can just
 display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 I would probably do this in a background thread and cache the results,
 that way when you have to render, you can just cache the latest results.

 I don't know why Cassandra can't seem to be able to fetch large batch
 sizes, I've also run into these timeouts but reducing the batch size to 2k
 seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
 seems like the difference would only be a few minutes. Do you have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows
 easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it
 too high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query
 which is giving timeout errors.
 I am fetching results by selecting clustering columns, then why
 the queries are taking so long. I can change the timeout settings 
 but I
 need the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, 
 image_height
 double, objective double, cancer_type varchar,  Area float, 
 submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid.
 But since my data is generally queried based upon *image_caseid *I
 have made it partition key.
 I am currently using Java Datastax api to fetch the results. But
 the query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out 
 waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
ya I have cluster total 10 nodes but I am just testing with one node
currently.
Total data for all nodes will exceed 5 billion rows. But I may have memory
on other nodes.

On Wed, Mar 18, 2015 at 6:06 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 4g also seems small for the kind of load you are trying to handle
 (billions of rows) etc.

 I would also try adding more nodes to the cluster.

 On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Yeah, it may be that the process is being limited by swap. This page:


 https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42

 Lines 42 - 48 list a few settings that you could try out for increasing /
 reducing the memory limits (assuming you're on linux).

 Also, are you using an SSD? If so make sure the IO scheduler is noop or
 deadline .

 On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Currently Cassandra java process is taking 1% of cpu (total 8% is being
 used) and 14.3% memory (out of total 4G memory).
 As you can see there is not much load from other processes.

 Should I try changing default parameters of memory in Cassandra settings.

 On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 What's your memory / CPU usage at? And how much ram + cpu do you have
 on this server?



 On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Currently there is only single node which I am calling directly with
 around 15 rows. Full data will be in around billions per node.
 The code is working only for size 100/200. Also the consecutive
 fetching is taking around 5-10 secs.

 I have a parallel script which is inserting the data while I am
 reading it. When I stopped the script it worked for 500/1000 but not more
 than that.



 On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

  If even 500-1000 isn't working, then your cassandra node might not
 be up.

 1) Try running nodetool status from shell on your cassandra server,
 make sure the nodes are up.

 2) Are you calling this on the same server where cassandra is
 running? Its trying to connect to localhost . If you're running it on a
 different server, try passing in the direct ip of your cassandra server.

 On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Data won't change much but queries will be different.
 I am not working on the rendering tool myself so I don't know much
 details about it.

 Also as suggested by you I tried to fetch data in size of 500 or
 1000 with java driver auto pagination.
 It fails when the number of records are high (around 10) with
 following error:

 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))


 On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 How often does the data change?

 I would still recommend a caching of some kind, but without knowing
 more details (how often the data is changing, what you're doing with 
 the 1m
 rows after getting them, etc) I can't recommend a solution.

 I did see your other thread. I would also vote for elasticsearch /
 solr , they are more suited for the kind of analytics you seem to be 
 doing.
 Cassandra is more for storing data, it isn't all that great for complex
 queries / analytics.

 If you want to stick to cassandra, you might have better luck if
 you made your range columns part of the primary key, so something like
 PRIMARY KEY(caseId, x, y)

 On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 The rendering tool renders a portion a very large image. It may
 fetch different data each time from billions of rows.
 So I don't think I can cache such large results. Since same
 results will rarely fetched again.

 Also do you know how I can do 2d range queries using Cassandra.
 Some other users suggested me using Solr.
 But is there any way I can achieve that without using any other
 technology.

 On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Sorry, meant to say that way when you have to render, you can
 just display the latest cache.

 On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com
  wrote:

 I would probably do this in a background thread and cache the
 results, that way when you have to render, you can just cache the 
 latest
 results.

 I don't know why Cassandra can't seem to be able to fetch large
 batch sizes, I've also run into these timeouts but reducing the 
 batch size
 to 2k seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be
 fetched within a minute.
 Is there a 

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
We have UI interface which needs this data for rendering.
So efficiency of pulling this data matters a lot. It should be fetched
within a minute.
Is there a way to achieve such efficiency


On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
 like the difference would only be a few minutes. Do you have to do this all
 the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query which
 is giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, 
 w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 
 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak













Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
I would probably do this in a background thread and cache the results, that
way when you have to render, you can just cache the latest results.

I don't know why Cassandra can't seem to be able to fetch large batch
sizes, I've also run into these timeouts but reducing the batch size to 2k
seemed to work for me.

On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
 like the difference would only be a few minutes. Do you have to do this all
 the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query which
 is giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid.
 But since my data is generally queried based upon *image_caseid *I
 have made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
 for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 
 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak














Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
Sorry, meant to say that way when you have to render, you can just display
the latest cache.

On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 I would probably do this in a background thread and cache the results,
 that way when you have to render, you can just cache the latest results.

 I don't know why Cassandra can't seem to be able to fetch large batch
 sizes, I've also run into these timeouts but reducing the batch size to 2k
 seemed to work for me.

 On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 We have UI interface which needs this data for rendering.
 So efficiency of pulling this data matters a lot. It should be fetched
 within a minute.
 Is there a way to achieve such efficiency


 On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
 seems like the difference would only be a few minutes. Do you have to do
 this all the time, or only once in a while?

 On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta meme...@cs.stonybrook.edu
  wrote:

 yes it works for 1000 but not more than that.
 How can I fetch all rows using this efficiently?

 On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any
 results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying
 them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se
 wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta 
 meme...@cs.stonybrook.edu wrote:

 Hi,

 I have requirement to fetch million row as result of my query which
 is giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I 
 need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id
 varchar, analysis_execution_uuid uuid, x  double, y double, loc 
 varchar, w
 double, h double, normalized varchar, type varchar, filehost varchar,
 filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
 varchar,
 image_mpp_x double, image_mpp_y double, image_width double, 
 image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY 
 ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid.
 But since my data is generally queried based upon *image_caseid *I
 have made it partition key.
 I am currently using Java Datastax api to fetch the results. But
 the query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out 
 waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out 
 waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 
 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak















Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Jack Krupansky
Cassandra can certainly handle millions and even billions of rows, but...
it is a very clear anti-pattern to design a single query to return more
than a relatively small number of rows except through paging. How small?
Low hundreds is probably a reasonable limit. It is also an anti-pattern to
filter or analyze a large number of rows in a single query - that's why
there are so many crazy restrictions and the requirement to use ALLOW
FILTERING - to reinforce that Cassandra is designed for short and
performant queries, not large-scale retrieval of a large number of rows. As
a general rule, the user of ALLOW FILTERING is an anti-pattern and a yellow
flag that you are doing something wrong.

As a minor point, check your partition key - you should try to bucket
rows that will tend to be accessed together so that they have locality so
that they can be fetched together.

Rather than using a raw x and y coordinate range, consider indexing by a
chunk number and then you can query by chunk number for direct access to
the partition and row key, without the need for inequality filtering.


-- Jack Krupansky

On Wed, Mar 18, 2015 at 3:22 AM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of 2000
 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak










Not seeing keyspace in nodetool compactionhistory

2015-03-18 Thread Ali Akhtar
When I run nodetool compactionhistory , I'm only seeing the system
keyspace, and OpsCenter keyspace in the compactions. I only see one mention
of my own keyspace, but its only for the smallest table within that
keyspace (containing only about 1k rows). My two other tables, containing
1.1m and 100k rows respectively, weren't to be seen.

Any reason why that is?

I did fill up the data in those two tables within the span of about 4 hours
(I ran a script to migrate existing data from legacy rdbms dbs). Could that
have something to do with it?

I'm using SizeTieredCompactionStrategy for all tables.


Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Eric Stevens
From your description, it sounds like you have a single partition key with
millions of clustered values on the same partition.  That's a very wide
partition.  You may very likely be causing a lot of memory pressure in your
Cassandra node (especially at 4G) while trying to execute the query.
Although the hard upper limit is 2 billion values per partition key, the
practical limit is much lower, sometimes more like 100k.  Also with very
wide partitions, you cannot take advantage of Cassandra's distributed
nature for reads, only one node will be involved in the read, so one node
will perform as well as a million nodes.

If bounding by area is a common task, then it might make sense to put area
or at least part of area into the partition key (bucket by area / 10 or /
100 or something) just to distribute the data around your cluster a little
better.  It makes your query path a little more involved, but it buys you
parallelism (you could execute all area buckets in a given query
simultaneously, and if your cluster is large enough only typically one node
is involved for each area bucket).

I wonder what your write pattern is like to fill the data in for a given
case ID.  Are you appending to the same partition key over a long period of
time?  If so, you may be scattering the data for a given partition key over
a large number of SSTables, and slowing down the read dramatically.  If
you're using size tiered compaction, do nodetool compact on that table and
wait for the node to settle down (0 outstanding/pending tasks in nodetool
compactionstats), then see if performance improves (you may also be able to
use nodetool cfhistograms to see how many sstables are being involved in a
read typically, but if all your queries are timing out, I'm not sure if
that will be an accurate reflection or not).

 It may fetch different data each time from billions of rows.
 My expectations were that Cassandra can handle a million rows easily.

I have a data set several orders of magnitude larger than what you're
talking about WRT your final data size, and with appropriate query and
storage patterns, Cassandra can definitely handle this kind of data.

One final note, your column names are pretty long.  You pay to store each
column name each time you store that column.  On small data sets it doesn't
matter, but at billions of rows it starts to add up.  There's negligible
(but nonzero) performance cost, but over time you may find that you have to
scale out just because you're filling up disks. See
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html


On Wed, Mar 18, 2015 at 6:19 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Cassandra can certainly handle millions and even billions of rows, but...
 it is a very clear anti-pattern to design a single query to return more
 than a relatively small number of rows except through paging. How small?
 Low hundreds is probably a reasonable limit. It is also an anti-pattern to
 filter or analyze a large number of rows in a single query - that's why
 there are so many crazy restrictions and the requirement to use ALLOW
 FILTERING - to reinforce that Cassandra is designed for short and
 performant queries, not large-scale retrieval of a large number of rows. As
 a general rule, the user of ALLOW FILTERING is an anti-pattern and a yellow
 flag that you are doing something wrong.

 As a minor point, check your partition key - you should try to bucket
 rows that will tend to be accessed together so that they have locality so
 that they can be fetched together.

 Rather than using a raw x and y coordinate range, consider indexing by a
 chunk number and then you can query by chunk number for direct access to
 the partition and row key, without the need for inequality filtering.


 -- Jack Krupansky

 On Wed, Mar 18, 2015 at 3:22 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w 

Recommended TTL time for max. performance with DateCompactionStrategy?

2015-03-18 Thread Ali Akhtar
I have a table which is going to be storing temporary search results. The
results will be available for a short time ( anywhere from 1 to 24 hours)
from the time of the search, and then should be deleted to clear up disk
space.

This is going to apply to all the rows within this table.

What would be the recommended TTL time for this table, so that it works
best with DateComactionStrategy, and causes the whole SSTable to get
deleted rather than keep tombestones?

Thanks.


RE: Problems after trying a migration

2015-03-18 Thread David CHARBONNIER
Hi Fabien,

Thank you for the link ! That’s exactly what we want to do.
But before starting this, we need to clean up the mess in order to get a clean 
cluster.

Thanks for your help.

Best regards,

[cid:image001.png@01D061A4.2E073720]

David CHARBONNIER

Sysadmin

T : +33 411 934 200

david.charbonn...@rgsystem.commailto:david.charbonn...@rgsystem.com


ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.comhttp://www.rgsystem.com/



[cid:image004.png@01D061A4.2E073720]



De : Fabien Rousseau [mailto:fab...@yakaz.com]
Envoyé : mercredi 18 mars 2015 17:32
À : user
Objet : Re: Problems after trying a migration

Hi David,

There is an excellent article which describes exactly what you want to do (ie 
migrate from one DC to another DC) :
http://planetcassandra.org/blog/cassandra-migration-to-ec2/

2015-03-18 17:05 GMT+01:00 David CHARBONNIER 
david.charbonn...@rgsystem.commailto:david.charbonn...@rgsystem.com:
Hi,

We’re using Cassandra through the Datastax Enterprise package in version 4.5.1 
(Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.

We need to move our Cassandra cluster from France to another country. To do 
this, we want to add a second 7-nodes datacenter to our cluster and stream all 
data between the two countries before dropping the first datacenter.

On January 31st, we tried doing so but we had some problems:

-  New nodes in the other country have been installed like French nodes 
except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other 
country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the 
other country)

-  The following procedure has been followed: 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
 but an error occurred during step 3. New nodes have been started before the 
cassandra-topology.properties file has been updated on the original datacenter. 
New nodes appeared in the original datacenter instead of the new one.

-  To recover our original cluster, we decommissionned every node of 
the new datacenter with the nodetool decommission command.

On February 9th, nodes in the second datacenter have been restarted and joined 
the cluster. We had to decommission them just like before.

On February 11th, we added disk space on our 7 running French nodes. To achieve 
this, we restarted the cluster but the nodes updated their perring informations 
and nodes from Luxembourg (decommissionned on February 9th) were present. This 
behaviour is described here: 
https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned 
system.peers table content.

On March 11th, we needed to add an 8th node to our existing French cluster. We 
installed the same Datastax Enterprise version (4.5.1 with Cassandra 2.0.8.39) 
and tried to add this node to the cluster with this procedure: 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html.
 In OPSCenter, the node was joining the cluster and data streaming got stuck at 
100%. After several hours, nodetool status showed us that the node was still 
joining but nothing in the logs let us know there was a problem. We restarted 
the node but it has no effect. Then we cleaned data and commitlog contents and 
try to add the node to the cluster again but without result.
Last try was to add the node with auto_bootstrap : false in order to add the 
node to the cluster manually but it messed up with the data. So we shut down 
the node and decommissioned it (with nodetool removenode). The whole cluster 
has been repaired and we stopped doing anything.

Now, our cluster has only 7 French nodes in which we can’t add any node. The 
OPSCenter data has disapeared and we work without any information about how our 
cluster is running.

You’ll find attached to this email our current configuration and a screenshot 
of our OPSCenter metric page.

Do you have some idea on how to clean up the mess and get our cluster running 
cleanly before we start our migration (France to another country like described 
in the beginning of this email)?

Thank you.

Best regards,

[cid:image001.png@01D061A4.2E073720]

David CHARBONNIER

Sysadmin

T : +33 411 934 200

david.charbonn...@rgsystem.commailto:david.charbonn...@rgsystem.com


ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.comhttp://www.rgsystem.com/



[cid:image004.png@01D061A4.2E073720]






--
Fabien Rousseau

[http://www.yakaz.com/img/logo_yakaz_small.png]
www.yakaz.comhttp://www.yakaz.com/


Re: Problems after trying a migration

2015-03-18 Thread Fabien Rousseau
Hi David,

There is an excellent article which describes exactly what you want to do
(ie migrate from one DC to another DC) :
http://planetcassandra.org/blog/cassandra-migration-to-ec2/

2015-03-18 17:05 GMT+01:00 David CHARBONNIER david.charbonn...@rgsystem.com
:

  Hi,



 We’re using Cassandra through the Datastax Enterprise package in version
 4.5.1 (Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.



 We need to move our Cassandra cluster from France to another country. To
 do this, we want to add a second 7-nodes datacenter to our cluster and
 stream all data between the two countries before dropping the first
 datacenter.



 On January 31st, we tried doing so but we had some problems:

 -  New nodes in the other country have been installed like French
 nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in
 the other country which means Cassandra version 2.0.8.39 in France and
 2.0.12.200 in the other country)

 -  The following procedure has been followed:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
 but an error occurred during step 3. New nodes have been started before
 the *cassandra-topology.properties* file has been updated on the original
 datacenter. New nodes appeared in the original datacenter instead of the
 new one.

 -  To recover our original cluster, we decommissionned every node
 of the new datacenter with the *nodetool decommission* command.



 On February 9th, nodes in the second datacenter have been restarted and
 joined the cluster. We had to decommission them just like before.



 On February 11th, we added disk space on our 7 running French nodes. To
 achieve this, we restarted the cluster but the nodes updated their perring
 informations and nodes from Luxembourg (decommissionned on February 9th)
 were present. This behaviour is described here:
 https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned
 *system.peers* table content.



 On March 11th, we needed to add an 8th node to our existing French
 cluster. We installed the same Datastax Enterprise version (4.5.1 with
 Cassandra 2.0.8.39) and tried to add this node to the cluster with this
 procedure:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html.
 In OPSCenter, the node was joining the cluster and data streaming got stuck
 at 100%. After several hours, *nodetool status* showed us that the node
 was still joining but nothing in the logs let us know there was a problem. We
 restarted the node but it has no effect. Then we cleaned data and commitlog
 contents and try to add the node to the cluster again but without result.

 Last try was to add the node with *auto_bootstrap : false* in order to
 add the node to the cluster manually but it messed up with the data. So we
 shut down the node and decommissioned it (with *nodetool removenode*).
 The whole cluster has been repaired and we stopped doing anything.



 Now, our cluster has only 7 French nodes in which we can’t add any node. The
 OPSCenter data has disapeared and we work without any information about how
 our cluster is running.



 You’ll find attached to this email our current configuration and a
 screenshot of our OPSCenter metric page.



 Do you have some idea on how to clean up the mess and get our cluster
 running cleanly before we start our migration (France to another country
 like described in the beginning of this email)?



 Thank you.



 Best regards,



 *David CHARBONNIER*

 Sysadmin

 T : +33 411 934 200

 david.charbonn...@rgsystem.com

 ZAC Aéroport

 125 Impasse Adam Smith

 34470 Pérols - France

 *www.rgsystem.com* http://www.rgsystem.com/










-- 
Fabien Rousseau


 aur...@yakaz.comwww.yakaz.com


Saving a file using cassandra

2015-03-18 Thread jean paul
Hello,

Finally, i have created my ring using cassandra.
Please, i'd like to store a file replicated 2 times in my cluster.
is that possible ? can you please send me a link for a tutorial ?


Thanks a lot.
Best Regards.


Upgrade from 1.2.19 to 2.0.12 -- seeing lots of SliceQueryFilter messages in system.log

2015-03-18 Thread Caraballo, Rafael
After upgrading a 3 node Cassandra cluster from 1.2.19 to 2.0.12, I have an 
event storm of  SliceQueryFilter messages flooding the Cassandra system.log 
file.

WARN [ReadStage:1043] 2015-03-18 15:14:12,708 SliceQueryFilter.java (line 231) 
Read 201 live and 13539 tombstoned cells in KeyspaceMetadata.CF_Folder (see 
tombstone_warn_threshold). 200 columns was requested, 
slices=[154184c2-85c1-11e2-b12e-c2ed2ac02b21-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647, 
ranges=[dc70cafe-ed8a-11e2-a178-5756012ec923-dc70cafe-ed8a-11e2-a178-5756012ec923:!,
 deletedAt=1424741296925196, 
localDeletion=1424741340][82bcb57a-ed8c-11e2-8fbd-3fb065c6b097-82bcb57a-ed8c-11e2-8fbd-3fb065c6b097:!,
 deletedAt=1424741296925196,...

This is the table definition referenced above:

CREATE TABLE CF_Folder (
  key blob,
  column1 timeuuid,
  column2 blob,
  value blob,
  PRIMARY KEY ((key), column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.10 AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=518400 AND
  read_repair_chance=0.10 AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  compaction={'sstable_size_in_mb': '160', 'class': 
'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

How can I stop this event storm?

Thanks,
Rafael Caraballo
Time Warner Cable


This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.


Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Jens Rantil
Hi,

Try setting fetchsize before querying. Assuming you don't set it too high, and 
you don't have too many tombstones, that should do it.

Cheers,
Jens



–
Skickat från Mailbox

On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi,
 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the queries
 are taking so long. I can change the timeout settings but I need the data
 to fetched faster as per my requirement.
 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*
 Here each row is uniquely identified on the basis of unique uuid. But since
 my data is generally queried based upon *image_caseid *I have made it
 partition key.
 I am currently using Java Datastax api to fetch the results. But the query
 is taking a lot of time resulting in timeout errors:
 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
 at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
 at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
 at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
 at QueryDB.queryArea(TestQuery.java:59)
 at TestQuery.main(TestQuery.java:35)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
 at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
 at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Also when I try the same query on console even while using limit of 2000
 rows:
 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1
 Thanks and Regards,
 Mehak

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
Hi Jens,

I have tried with fetch size of 1 still its not giving any results.
My expectations were that Cassandra can handle a million rows easily.

Is there any mistake in the way I am defining the keys or querying them.

Thanks
Mehak

On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too high,
 and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have made
 it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of 2000
 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak









Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Ali Akhtar
Have you tried a smaller fetch size, such as 5k - 2k ?

On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of 2000
 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak










Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Mehak Mehta
yes it works for 1000 but not more than that.
How can I fetch all rows using this efficiently?

On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 Have you tried a smaller fetch size, such as 5k - 2k ?

 On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi Jens,

 I have tried with fetch size of 1 still its not giving any results.
 My expectations were that Cassandra can handle a million rows easily.

 Is there any mistake in the way I am defining the keys or querying them.

 Thanks
 Mehak

 On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Try setting fetchsize before querying. Assuming you don't set it too
 high, and you don't have too many tombstones, that should do it.

 Cheers,
 Jens

 –
 Skickat från Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
 wrote:

 Hi,

 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the
 queries are taking so long. I can change the timeout settings but I need
 the data to fetched faster as per my requirement.

 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*

 Here each row is uniquely identified on the basis of unique uuid. But
 since my data is generally queried based upon *image_caseid *I have
 made it partition key.
 I am currently using Java Datastax api to fetch the results. But the
 query is taking a lot of time resulting in timeout errors:

  Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
  at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
  at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
  at QueryDB.queryArea(TestQuery.java:59)
  at TestQuery.main(TestQuery.java:35)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
  at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

 Also when I try the same query on console even while using limit of
 2000 rows:

 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1

 Thanks and Regards,
 Mehak











Re: schema generation in cassandra

2015-03-18 Thread Ankit Agarwal
Thanks! a lot for your responses,

My question is , what all best practices used for database schema
deployment for a microservice in cloud environment.

e.g., shall we create it with deployement of microservice or it should be
generated via code or should not be generated via code instead should be
generated separately.

On Wed, Mar 18, 2015 at 3:29 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Why are you creating new tables dynamically? I would try to use a static
 schema and use a collection (list / map / set) for storing arbitrary data.

 On Wed, Mar 18, 2015 at 2:52 PM, Ankit Agarwal 
 agarwalankit.k...@gmail.com wrote:

 Hi,

 I am new to Cassandra, we are planning to use Cassandra for cloud base
 application in our development environment, so for this looking forward for
 best strategies to sync the schema for micro-services while deploy
 application on cloud foundry

 One way which I could use is   Accessor interface with datastax-mapper
 and casandra-core driver.



 1.)  I have created a keyspace using  core driver which will be generated
 on initialization of servlet

 *public* *void* init() *throws* ServletException

 {
 Cluster cluster = Cluster.*builder*().addContactPoint(127.0.0.1
 ).build();

 Session session = cluster.connect();

  String keySpace=sampletest;

 session.execute(CREATE KEYSPACE IF NOT EXISTS + keySpace +

  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' :
 1 });

 ..


 2.) This is my Accessor Interface which I used to generate Query
 for creating column family ..

 @Accessor

 *public* *interface* UserAccessor

 {

 @Query( CREATE TABLE sampletest.emp (id uuid PRIMARY KEY,name
 text,department text,location text,phone bigint ) WITH caching = '{
 \keys\ : \ALL\ , \rows_per_partition\ : \NONE\ }'  )

 ResultSet create_table();

 }




 3.) Creating  an  instance of Accessor interface to provide mapping  to
 our query to generate column family

 

 MappingManager mapper=*new* MappingManager(session);

 UserAccessor ua= mapper.createAccessor(UserAccessor.*class*);

  ua.create_table();

  



 4.) So far I have created a keyspace with a column family , now I want to
 map my data using POJO class mentioned below


 @Table(keyspace = sampletest, name = emp)
 *public* *class* Employee {

  @PartitionKey

 *private* UUID id;

 *private* String name;

 *private* String department;

 *private* String location;

 *private* Long phone;

 // getter setter method
 .
 }



 Is there any other better approach to achieve this especially for cloud
 environment


 --
 Thanks

 Ankit Agarwal





-- 
Thanks  Regards

Ankit Agarwal
+91-9953235575


Re: Problems after trying a migration

2015-03-18 Thread Jan

Hi David; 
some input to get back to where you were : a) Start with the French cluster 
only and get it working with DSE 4.5.1 b) Opscenter keyspace is by default RF1; 
  alter the keyspace to RF3 c) Take a full snapshot of all your nodes  copy 
the files to a safe location on all the nodes 
To migrate the data into new cluster: a) Use the same version DSE 4.5.1 in 
Luxembourg  bring up 1 node at a time.    Check that the node has comeup in 
the new Datacenter.b) Bring up new nodes into the new Datacenter one at a 
timec)  After all your new nodes are UP in Luxembourg, conduct a 'nodetool 
repair -parallel'    d)  Check in OpsCenter that you have all your nodes 
showing up (new and old)e) Start taking down your nodes in France, one at  a 
timef) After all the nodes in France are down,  conduct a 'nodetool repair 
-parallel'  again g) Upgrade the nodes in Luxembourg to DSE 4.6.1 h)  conduct a 
'nodetool repair -parallel'  again i) Upgrade to  OpsCenter 5.1  
Best of luck,  hope this helps. 
Jan/
 



 On Wednesday, March 18, 2015 1:01 PM, Robert Coli rc...@eventbrite.com 
wrote:
   

 On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER 
david.charbonn...@rgsystem.com wrote:

- New nodes in the other country have been installed like French nodes 
except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other 
country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the 
other country)

This is officially unsupported, and might cause of problems during this process.
=Rob 

  

Re: Upgrade from 1.2.19 to 2.0.12 -- seeing lots of SliceQueryFilter messages in system.log

2015-03-18 Thread Robert Coli
On Wed, Mar 18, 2015 at 10:14 AM, Caraballo, Rafael 
rafael.caraba...@twcable.com wrote:

  After upgrading a 3 node Cassandra cluster from 1.2.19 to 2.0.12, I have
 an event storm of “ SliceQueryFilter” messages flooding the Cassandra
 system.log file.



 How can I stop this event storm?



As the message says :

 (see tombstone_warn_threshold) 

The thing you are being warned about is that your write pattern results in
a significant number of tombstones. In general this is a smell of badness
in Cassandra, which is why the log message exists.

To Resolve :

1) Increase tombstone_warn_threshold
OR
2) Stop creating so many tombstones

=Rob
http://twitter.com/rcolidba


Limit on number of columns

2015-03-18 Thread Ruebenacker, Oliver A

 Hello,

  For the limit of number of 
cellshttp://wiki.apache.org/cassandra/CassandraLimitations (columns *rows) 
per partition, I wonder what we mean by number of columns, since different rows 
may have different columns? Is the number of columns the number of columns of 
the biggest row, or the union of all columns across all rows? E.g. if I have 
two rows, one has ten columns and the other has ten different columns, would 
that be considered a total of ten or twenty columns?

  Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Re: Limit on number of columns

2015-03-18 Thread Robert Coli
On Wed, Mar 18, 2015 at 12:43 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.com wrote:

For the limit of number of cells
 http://wiki.apache.org/cassandra/CassandraLimitations (columns *rows)
 per partition, I wonder what we mean by number of columns, since different
 rows may have different columns? Is the number of columns the number of
 columns of the biggest row, or the union of all columns across all rows?
 E.g. if I have two rows, one has ten columns and the other has ten
 different columns, would that be considered a total of ten or twenty
 columns?


I tend to still think of this in terms of storage partitions and how many
storage columns a given one may contain. It's possible that apache doc has
not been updated to reflect the new language of partitions and cells
and etc.

A given partition can contain 2Bn storage columns, regardless of how many
other columns there are in other partitions.

=Rob


Re: Problems after trying a migration

2015-03-18 Thread Robert Coli
On Wed, Mar 18, 2015 at 12:58 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER 
 david.charbonn...@rgsystem.com wrote:

  -  New nodes in the other country have been installed like
 French nodes except for Datastax Enterprise version (4.5.1 in France and
 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France
 and 2.0.12.200 in the other country)


 This is officially unsupported, and might cause of problems during this
 process.


As regards your other situation, I suggest joining #cassandra and pointing
people there towards your summary and interactively discussing it with
them. Mailing list lag is not best for operational issues. :)

=Rob


Re: Limit on number of columns

2015-03-18 Thread Jack Krupansky
Generally a concern for limitations on number of columns is a concern about
storage for rows in a partition. Cassandra is a column-oriented database,
but this is really referring to its cell-oriented storage structure, with
each column name and column value pair being a single cell (except
collections, which may occupy multiple cells per column, one for each value
in the collection.) So, the issue is not the total number of column names
used, but the total number of cells used in a partition. So, for your
example, you have 20 cell values and... 20 column names.

-- Jack Krupansky

On Wed, Mar 18, 2015 at 3:43 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.com wrote:



  Hello,



   For the limit of number of cells
 http://wiki.apache.org/cassandra/CassandraLimitations (columns *rows)
 per partition, I wonder what we mean by number of columns, since different
 rows may have different columns? Is the number of columns the number of
 columns of the biggest row, or the union of all columns across all rows?
 E.g. if I have two rows, one has ten columns and the other has ten
 different columns, would that be considered a total of ten or twenty
 columns?



   Thanks!



  Best, Oliver



 Oliver Ruebenacker | Solutions Architect



 Altisource™

 290 Congress St, 7th Floor | Boston, Massachusetts 02210

 P: (617) 728-5582 | ext: 275585

 oliver.ruebenac...@altisource.com | www.Altisource.com




 ***

 This email message and any attachments are intended solely for the use of
 the addressee. If you are not the intended recipient, you are prohibited
 from reading, disclosing, reproducing, distributing, disseminating or
 otherwise using this transmission. If you have received this message in
 error, please promptly notify the sender by reply email and immediately
 delete this message from your system.
 This message and any attachments may contain information that is
 confidential, privileged or exempt from disclosure. Delivery of this
 message to any person other than the intended recipient is not intended to
 waive any right or privilege. Message transmission is not guaranteed to be
 secure or free of software viruses.

 ***