Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Reid Pinchback
I defer to Sean’s comment on materialized views.  I’m more familiar with 
DynamoDB on that front, where you do this pretty routinely.  I was curious so I 
went looking. This appears to be the C* Jira that points to many of the problem 
points:

https://issues.apache.org/jira/browse/CASSANDRA-13826

Abdul, you’d probably want to refer to that or similar info.  Could be that the 
more practical resolution is to just have the client write the data twice, if 
there are two very different query patterns to support.  Writes usually have 
quite low latency in C*, so double-writing may be less of a performance hit, 
and later drag on memory on I/O, than a query model that makes you browse 
through more data than necessary.

From: "Durity, Sean R" 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 6, 2020 at 4:24 PM
To: "user@cassandra.apache.org" 
Subject: RE: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
Reid is right. You build the tables to easily answer the queries you want. So, 
start with the query! I inferred a query for you based on what you mentioned. 
If my inference is wrong, the table structure is likely wrong, too.

So, what kind of query do you want to run?

(NOTE: a select count(*) that is not restricted to within a single partition is 
a very bad option. Don’t do that)

The query for my table below is simply:
select user_count [, other columns] from users_by_day where date = ? and hour = 
? and minute = ?


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if

RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Reid is right. You build the tables to easily answer the queries you want. So, 
start with the query! I inferred a query for you based on what you mentioned. 
If my inference is wrong, the table structure is likely wrong, too.

So, what kind of query do you want to run?

(NOTE: a select count(*) that is not restricted to within a single partition is 
a very bad option. Don’t do that)

The query for my table below is simply:
select user_count [, other columns] from users_by_day where date = ? and hour = 
? and minute = ?


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
From reports on this mailing list, I do not allow materialized views.


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 

Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Reid Pinchback
Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org" 
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable

Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Abdul Patel
this is the schema similar to what we have , they want to get user
connected  - concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go
for materialized views ?

CREATE TABLE  usr_session (

userid bigint,

session_usr text,

last_access_time timestamp,

login_time timestamp,

status int,

PRIMARY KEY (userid, session_usr)

) WITH CLUSTERING ORDER BY (session_usr ASC)



On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
wrote:

> Do you only need the current count or do you want to keep the historical
> counts also? By active users, does that mean some kind of user that the
> application tracks (as opposed to the Cassandra user connected to the
> cluster)?
>
>
>
> I would consider a table like this for tracking active users through time:
>
>
>
> Create table users_by_day (
>
> app_date date,
>
> hour integer,
>
> minute integer,
>
> user_count integer,
>
> longest_login_user text,
>
> longest_login_seconds integer,
>
> last_login datetime,
>
> last_login_user text )
>
> primary key (app_date, hour, minute);
>
>
>
> Then, your reporting can easily select full days or a specific, one-minute
> slice. Of course, the app would need to have a timer and write out the
> data. I would also suggest a TTL on the data so that you only keep what you
> need (a week, a year, whatever). Of course, if your reporting requires
> different granularities, you could consider a different time bucket for the
> table (by hour, by week, etc.)
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Abdul Patel 
> *Sent:* Thursday, February 6, 2020 1:54 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Running select against cassandra
>
>
>
> Its sort of user connected, app team needa number of active users
> connected say  every 1 to 5 mins.
>
> The timeout at app end is 120ms.
>
>
>
>
>
> On Thursday, February 6, 2020, Michael Shuler 
> wrote:
>
> You'll have to be more specific. What is your table schema and what is the
> SELECT query? What is the normal response time?
>
> As a basic guide for your general question, if the query is something sort
> of irrelevant that should be stored some other way, like a total row count,
> or most any SELECT that requires ALLOW FILTERING, you're doing it wrong and
> should re-evaluate your data model.
>
> 1 query per minute is a minuscule fraction of the basic capacity of
> queries per minute that a Cassandra cluster should be able to handle with
> good data modeling and table-relevant query. All depends on the data model
> and query.
>
> Michael
>
> On 2/6/20 12:20 PM, Abdul Patel wrote:
>
> Hi,
>
> Is it advisable to run select query to fetch every minute to grab data
> from cassandra for reporting purpose, if no then whats the alternative?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel 
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?


-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.