Re: Cassandra2.0.14 : Obsolete files not being deleted after compaction

2020-02-06 Thread Laxmikant Upadhyay
Hi,
Just an update, We deleted obsolete sstables and it worked fine. However I
am not able to find out any jira for same issue.

On Wed, Jan 22, 2020 at 3:58 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Thanks Jeff.
>
> There was no restart between "Compacting" and "Compacted" logs but I
> observed that full repair (-pr) was running at that time with errors.
>
> *Caused by: java.lang.RuntimeException: java.io.IOException: Cannot
> proceed on repair because a neighbor (/aa.bb.cc.dd) is dead: session failed*
>
> Does anyone remember any JIRA ticket related to obsolete sstables not
> being deleted after compaction?
>
> Regards
> Manish
>
>
>
>
>
> On Wed, Jan 22, 2020 at 11:37 AM Jeff Jirsa  wrote:
>
>>
>>
>> On Tue, Jan 21, 2020 at 8:58 PM manish khandelwal <
>> manishkhandelwa...@gmail.com> wrote:
>>
>>> Thanks Nitan,
>>>
>>>  Thanks for your reply.
>>>
>>> I am using following methodology to find obsolete sstables and just want
>>> to make sure that I don't delete live data if I delete them .
>>>
>>> In the following logs I searched for sstable "
>>> keyspace-columnfamily-jb-456789" and found that this 
>>> "*CompactionExecutor:1957"
>>> *thread compacted  keyspace-columnfamily-jb-123456-Data.db ,
>>> keyspace-columnfamily-jb-234567 -Data.db , keyspace-columnfamily-jb-
>>> 345678-Data.db. These files are still present in my data directory so I am
>>> assuming that they are obsolete. I*s my assumption correct*?
>>>
>>
>> The lines from 'Compacting' are the ones obsoleted IF and ONLY IF you see
>> a completed "Compacted" line for the same thread without a restart in
>> between.
>>
>>
>>>
>>> INFO [CompactionExecutor:1957] 2020-01-20 06:44:56,721
>>> CompactionTask.java (line 120) Compacting
>>> [SSTableReader(path='/var/lib/cassandra/data/keyspace/columnfamily/
>>> *keyspace-columnfamily-jb-123456-Data.db*'),
>>> SSTableReader(path='/var/lib/cassandra/data/keyspace/columnfamily/
>>> *keyspace-columnfamily-jb-234567-Data.db*'),
>>> SSTableReader(path='/var/lib/cassandra/data/keyspace/columnfamily/
>>> *keyspace-columnfamily-jb-345678-Data.db*')]
>>>  INFO [CompactionExecutor:1957] 2020-01-20 12:45:23,270
>>> ColumnFamilyStore.java (line 795) Enqueuing flush of
>>> Memtable-compactions_in_progress@519967741(0/0 serialized/live bytes, 1
>>> ops)
>>>  INFO [*CompactionExecutor:1957*] 2020-01-20 12:45:23,502
>>> CompactionTask.java (line 296) Compacted 3 sstables to
>>> [/var/lib/cassandra/data/keyspace/columnfamily/
>>> *keyspace-columnfamily-jb-456789*,].  136,795,757,524 bytes to
>>> 100,529,812,389 (~73% of original) in 21,626,781ms = 4.433055MB/s.
>>>  1,738,999,743 total partitions merged to 1,274,232,528.  Partition merge
>>> counts were {1:1049583261, 2:309997005, 3:23140824, }
>>>
>>>
>> In this case,
>> /var/lib/cassandra/data/keyspace/columnfamily/keyspace-columnfamily-jb-123456-*
>> , 
>> /var/lib/cassandra/data/keyspace/columnfamily/keyspace-columnfamily-jb-234567-*,
>> and 
>> /var/lib/cassandra/data/keyspace/columnfamily/keyspace-columnfamily-jb-345678-*
>> are all obsolete and should be gc'd "soon". If they're not being gc'd,
>> there's something wrong and you should figure out what's going on. The
>> cases where this happened in 2.0.x (which is what you're running) were
>> usually pretty nasty bugs, and consider this a reason why you should be
>> upgrading.
>>
>> Note that if you just `rm` those files, you'll probably throw
>> FileNotFound exceptions and break the node until you restart, which is bad.
>> You'd have to stop the host, confirm everything is shut down, then remove
>> that 137GB worth of input files if they still exist.
>>
>> Also, please upgrade to 2.1.20. Your life will probably be much easier
>> because of it.
>>
>> As with all things, these are personal opinions, I cant guarantee they're
>> safe, manually mucking around with database data files is scary, make sure
>> you have a backup, practice in a lab, etc.
>>
>>
>>> Regards
>>> Manish
>>>
>>>
>>> On Tue, Jan 21, 2020 at 9:09 PM Nitan Kainth 
>>> wrote:
>>>
 If you are certain that you don’t need data, your plan is good. Make
 sure to delete all the files for any given sequence number ie data, index,
 toc etc

 Regards,

 Nitan

 Cell: 510 449 9629

 On Jan 21, 2020, at 5:36 AM, manish khandelwal <
 manishkhandelwa...@gmail.com> wrote:

 
 Hi Team

 I am observing some obsolete files in Cassandra 2.0.14 which are
 already compacted but not removed from the system after compaction.
 As per CASSANDRA-7872
  , after GC
 grace period has passed the sstables are open for read again and can lead
 to data resurrection. I am facing disk crunch  (90% full ) as well and need
 to remove those obsolete files ASAP.


 To avoid this what should be our strategy? I am thinking on following
 lines
 1. Stop the Cassandra server.
 2. Remove the obsolete files 

Re: sstableloader: How much does it actually need?

2020-02-06 Thread manish khandelwal
Yes you will have all the data in two nodes provided there is no mutation
drop at node level or data is repaired

For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2,
node3 and node4)

Data A is in node1, node2 and node3
Data B is in node2, node3, and node4
Data C is in node3, node4 and node1
Data D is in node4, node1 and node2

With this configuration, any *two nodes combined* will give all the data.


Regards
Manish

On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot 
wrote:

> Been thinking about it, and I can't really see how with 4 nodes and RF=3,
> any 2 nodes would *not* have all the data; but am more than willing to
> learn.
>
> On the other thing: that's an attractive option, but in our case, the
> target cluster will likely come into use before the source-cluster data is
> available to load. Seemed to me the safest approach was sstableloader.
>
> Thanks
>
> On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:
>
>> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
>> full copy of data. I'd rather not say "it depends". 
>>
>> TIP: If the nodes in the target cluster have identical tokens allocated,
>> you can just do a straight copy of the sstables node-for-node then do 
>> nodetool
>> refresh. If the target cluster is already built and you can't assign the
>> same tokens then sstableloader is your only option. Cheers!
>>
>> P.S. No need to apologise for asking questions. That's what we're all
>> here for. Just keep them coming. 
>>
>>>


Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Erick Ramirez
>
> So do you advise copying tokens in such cases ? What procedure is
> advisable ?
>

Specifically for your case with 3 nodes + RF=3, it won't make a difference
so leave it as it is.


> Latency increased on target cluster.
>

Have you tried to run a trace of the queries which are slow? It will help
you identify where the slowness is coming from. Cheers!


Re: Running select against cassandra

2020-02-06 Thread Erick Ramirez
>
> Also is materialized view good for production?


I agree with Sean's and Reid's sentiments about MVs. I still think of MVs
as being experimental and not ready for primetime. I would wait for the
improvements which may be coming in C* 4.0 but no promises there... yet. :)
Cheers!


Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Ankit Gadhiya
Thanks Eric.
So do you advise copying tokens in such cases ? What procedure is advisable
?

Latency increased on target cluster. I’d double check on storage disks but
it should be same.


— Ankit

On Thu, Feb 6, 2020 at 9:07 PM Erick Ramirez  wrote:

> I didn’t copy tokens since it’s an identical cluster and we have RF as 3
>> on 3 node cluster. Is it still needed , why?
>>
>
> In C*, same number of nodes alone isn't enough. Clusters aren't really
> identical unless token assignments are the same. In your case though since
> each node has a full copy of the data (RF = N nodes), they "appear"
> identical.
>
> I recently migrated Cassandra keyspace data from one Azure cluster (3
>> Nodes) to another (3 nodes different region) using simple sstable copy.
>> Post this , we are observing overall response time has increased and
>> timeouts every 20 mins.
>>
>
>  You mean the response time on the source cluster increased? Or the
> destination cluster? I can't see how the copy could affect latency unless
> you're using premium storage disks and you've maxed out the throughput on
> them. For example, P30 disks are capped at 200MB/s.
>
> Do I need to copy anything from system*
>
>
> No, system tables are local to a node. Only ever copy the application
> keyspaces. Cheers!
>
-- 
*Thanks & Regards,*
*Ankit Gadhiya*


Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Erick Ramirez
>
> I didn’t copy tokens since it’s an identical cluster and we have RF as 3
> on 3 node cluster. Is it still needed , why?
>

In C*, same number of nodes alone isn't enough. Clusters aren't really
identical unless token assignments are the same. In your case though since
each node has a full copy of the data (RF = N nodes), they "appear"
identical.

I recently migrated Cassandra keyspace data from one Azure cluster (3
> Nodes) to another (3 nodes different region) using simple sstable copy.
> Post this , we are observing overall response time has increased and
> timeouts every 20 mins.
>

 You mean the response time on the source cluster increased? Or the
destination cluster? I can't see how the copy could affect latency unless
you're using premium storage disks and you've maxed out the throughput on
them. For example, P30 disks are capped at 200MB/s.

Do I need to copy anything from system*


No, system tables are local to a node. Only ever copy the application
keyspaces. Cheers!


Re: Running select against cassandra

2020-02-06 Thread Abdul Patel
Thanks all for valuable inputs.
I agree we nees to have query defined then plan the schema of table , but
the server is live for 2 yrs now in production and this is new requiremnt
so changing schema is not a  option and secondary index is also bad idea.

I was thinking to go with materialized view or see how select perform in
non prod and see which fares better.
So wanted to see if we ca. Do anything other than that in existing schema.
Also copy option was discussed but copy doest support where clause.


On Thursday, February 6, 2020, Reid Pinchback 
wrote:

> I defer to Sean’s comment on materialized views.  I’m more familiar with
> DynamoDB on that front, where you do this pretty routinely.  I was curious
> so I went looking. This appears to be the C* Jira that points to many of
> the problem points:
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13826
>
>
>
> Abdul, you’d probably want to refer to that or similar info.  Could be
> that the more practical resolution is to just have the client write the
> data twice, if there are two very different query patterns to support.
> Writes usually have quite low latency in C*, so double-writing may be less
> of a performance hit, and later drag on memory on I/O, than a query model
> that makes you browse through more data than necessary.
>
>
>
> *From: *"Durity, Sean R" 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Thursday, February 6, 2020 at 4:24 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *RE: [EXTERNAL] Re: Running select against cassandra
>
>
>
> *Message from External Sender*
>
> Reid is right. You build the tables to easily answer the queries you want.
> So, start with the query! I inferred a query for you based on what you
> mentioned. If my inference is wrong, the table structure is likely wrong,
> too.
>
>
>
> So, what kind of query do you want to run?
>
>
>
> (NOTE: a select count(*) that is not restricted to within a single
> partition is a very bad option. Don’t do that)
>
>
>
> The query for my table below is simply:
>
> select user_count [, other columns] from users_by_day where date = ? and
> hour = ? and minute = ?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Reid Pinchback 
> *Sent:* Thursday, February 6, 2020 4:10 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Running select against cassandra
>
>
>
> Abdul,
>
>
>
> When in doubt, have a query model that immediately feeds you exactly what
> you are looking for. That’s kind of the data model philosophy that you want
> to shoot for as much as feasible with C*.
>
>
>
> The point of Sean’s table isn’t the similarity to yours, it is how he has
> it keyed because it suits a partition structure much better aligned with
> what you want to request.  So I’d say yes, if a materialized view is how
> you want to achieve a denormalized state where the query model directly
> supports giving you want you want to query for, that sounds like an
> appropriate option to consider.  You might want a composite partition key
> for having an efficient selection of narrow time ranges.
>
>
>
> *From: *Abdul Patel 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Thursday, February 6, 2020 at 2:42 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: [EXTERNAL] Re: Running select against cassandra
>
>
>
> *Message from External Sender*
>
> this is the schema similar to what we have , they want to get user
> connected  - concurrent count for every say 1-5 minutes.
>
> i am thinking will simple select will have performance issue or we can go
> for materialized views ?
>
>
>
> CREATE TABLE  usr_session (
>
> userid bigint,
>
> session_usr text,
>
> last_access_time timestamp,
>
> login_time timestamp,
>
> status int,
>
> PRIMARY KEY (userid, session_usr)
>
> ) WITH CLUSTERING ORDER BY (session_usr ASC)
>
>
>
>
>
> On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
> wrote:
>
> Do you only need the current count or do you want to keep the historical
> counts also? By active users, does that mean some kind of user that the
> application tracks (as opposed to the Cassandra user connected to the
> cluster)?
>
>
>
> I would consider a table like this for tracking active users through time:
>
>
>
> Create table users_by_day (
>
> app_date date,
>
> hour integer,
>
> minute integer,
>
> user_count integer,
>
> longest_login_user text,
>
> longest_login_seconds integer,
>
> last_login datetime,
>
> last_login_user text )
>
> primary key (app_date, hour, minute);
>
>
>
> Then, your reporting can easily select full days or a specific, one-minute
> slice. Of course, the app would need to have a timer and write out the
> data. I would also suggest a TTL on the data so that you only keep what you
> need (a week, a year, whatever). Of course, if your reporting requires
> different granularities, you could consider a different time bucket for the
> table (by hour, by week, etc.)
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Abdul Patel 
> 

Re: Nodes becoming unresponsive

2020-02-06 Thread Erick Ramirez
>
> I tried to debug more and could see using top that Command is
> MutationStage in top output , Any clue we get from this ?
>

 That just means there's lots of writes hitting your cluster. Without the
thread dump, it would be difficult to know if the threads are blocked by
futex_wait or whatever else is going on. Cheers!


Re: Nodes becoming unresponsive

2020-02-06 Thread Surbhi Gupta
I have limited options to use JDK based tools because in our environment we
are running JRE .

I tried to debug more and could see using top that Command is MutationStage
in top output , Any clue we get from this ?

top - 16:30:47 up 94 days,  5:33,  1 user,  load average: 134.83, 142.48,
144.75
Tasks: 564 total,  58 running, 506 sleeping,   0 stopped,   0 zombie
Cpu(s): 95.2%us,  2.5%sy,  0.3%ni,  1.7%id,  0.0%wa,  0.0%hi,  0.3%si,
 0.0%st
Mem:  132236016k total, 131378384k used,   857632k free,   189208k buffers
Swap:0k total,0k used,0k free, 94530140k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  WCHAN
COMMAND

 11798 cassandr  20   0  261g  42g  14g R 14.4 33.3  76:47.38 -
MutationStage-1

 11762 cassandr  20   0  261g  42g  14g S 14.1 33.3  82:19.84 -
MutationStage-9

 11530 cassandr  20   0  261g  42g  14g R 13.8 33.3 100:00.22 -
MutationStage-3

 11501 cassandr  20   0  261g  42g  14g S 13.4 33.3   2598:38 -
MutationStage-5

 11688 cassandr  20   0  261g  42g  14g S 13.1 33.3  90:42.47 -
MutationStage-5

 11512 cassandr  20   0  261g  42g  14g R 12.8 33.3 153:13.59 -
MutationStage-1

 11534 cassandr  20   0  261g  42g  14g R 12.8 33.3 104:48.21 -
MutationStage-2

 11708 cassandr  20   0  261g  42g  14g S 12.5 33.3  87:17.64 -
MutationStage-6

 11783 cassandr  20   0  261g  42g  14g S 12.5 33.3  76:01.10 futex_wai
MutationStage-1

 11792 cassandr  20   0  261g  42g  14g S 12.5 33.3  76:19.90 futex_wai
MutationStage-1

 11504 cassandr  20   0  261g  42g  14g S 12.2 33.3 859:10.54 futex_wai
MutationStage-8

 11517 cassandr  20   0  261g  42g  14g R 12.2 33.3 116:18.38 -
MutationStage-2

 11535 cassandr  20   0  261g  42g  14g R 12.2 33.3  96:11.11 -
MutationStage-3

 11710 cassandr  20   0  261g  42g  14g R 12.2 33.3  86:50.77 -
MutationStage-7

 11730 cassandr  20   0  261g  42g  14g S 12.2 33.3  78:36.04 -
MutationStage-1

 11743 cassandr  20   0  261g  42g  14g R 12.2 33.3  80:27.18 -
MutationStage-1

 11773 cassandr  20   0  261g  42g  14g R 12.2 33.3  79:29.48 -
MutationStage-1

 11800 cassandr  20   0  261g  42g  14g S 12.2 33.3  77:01.39 futex_wai
MutationStage-1

 11830 cassandr  20   0  261g  42g  14g R 12.2 33.3  70:47.18 -
MutationStage-1

 11495 cassandr  20   0  261g  42g  14g R 11.8 33.3   7693:04 -
MutationStage-3

 11675 cassandr  20   0  261g  42g  14g R 11.8 33.3  94:13.22 -
MutationStage-4

 11683 cassandr  20   0  261g  42g  14g S 11.8 33.3  91:42.91 futex_wai
MutationStage-4

 11701 cassandr  20   0  261g  42g  14g S 11.8 33.3  85:16.00 -
MutationStage-7

 11703 cassandr  20   0  261g  42g  14g R 11.8 33.3  88:33.81 -
MutationStage-6

 11725 cassandr  20   0  261g  42g  14g R 11.8 33.3  78:12.70 -
MutationStage-1

 11752 cassandr  20   0  261g  42g  14g S 11.8 33.3  83:25.14 futex_wai
MutationStage-9

 11755 cassandr  20   0  261g  42g  14g R 11.8 33.3  82:38.87 -
MutationStage-9

 11776 cassandr  20   0  261g  42g  14g S 11.8 33.3  79:31.49 futex_wai
MutationStage-1

 11781 cassandr  20   0  261g  42g  14g R 11.8 33.3  75:01.54 -
MutationStage-1

 11796 cassandr  20   0  261g  42g  14g S 11.8 33.3  77:03.78 -
MutationStage-1

 11804 cassandr  20   0  261g  42g  14g R 11.8 33.3  81:38.46 -
MutationStage-1

 11818 cassandr  20   0  261g  42g  14g S 11.8 33.3  76:51.42 -
MutationStage-1

 11823 cassandr  20   0  261g  42g  14g R 11.8 33.3  75:56.69 -
MutationStage-1

 11506 cassandr  20   0  261g  42g  14g R 11.5 33.3 502:50.67 -
MutationStage-1

 11513 cassandr  20   0  261g  42g  14g R 11.5 33.3 140:00.60 -
MutationStage-1

 11515 cassandr  20   0  261g  42g  14g S 11.5 33.3 123:31.16 futex_wai
MutationStage-1

 11676 cassandr  20   0  261g  42g  14g S 11.5 33.3  93:44.36 futex_wai
MutationStage-4

 11680 cassandr  20   0  261g  42g  14g S 11.5 33.3  93:28.55 futex_wai
MutationStage-4

 11706 cassandr  20   0  261g  42g  14g R 11.5 33.3  89:17.10 -
MutationStage-6

 11729 cassandr  20   0  261g  42g  14g R 11.5 33.3  78:42.33 -
MutationStage-1


On Thu, 6 Feb 2020 at 10:17, Elliott Sims  wrote:

> Async-profiler (https://github.com/jvm-profiling-tools/async-profiler )
> flamegraphs can also be a really good tool to figure out the exact
> callgraph that's leading to the futex_wait, both in and out of the JVM.
>


Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Reid Pinchback
I defer to Sean’s comment on materialized views.  I’m more familiar with 
DynamoDB on that front, where you do this pretty routinely.  I was curious so I 
went looking. This appears to be the C* Jira that points to many of the problem 
points:

https://issues.apache.org/jira/browse/CASSANDRA-13826

Abdul, you’d probably want to refer to that or similar info.  Could be that the 
more practical resolution is to just have the client write the data twice, if 
there are two very different query patterns to support.  Writes usually have 
quite low latency in C*, so double-writing may be less of a performance hit, 
and later drag on memory on I/O, than a query model that makes you browse 
through more data than necessary.

From: "Durity, Sean R" 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 6, 2020 at 4:24 PM
To: "user@cassandra.apache.org" 
Subject: RE: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
Reid is right. You build the tables to easily answer the queries you want. So, 
start with the query! I inferred a query for you based on what you mentioned. 
If my inference is wrong, the table structure is likely wrong, too.

So, what kind of query do you want to run?

(NOTE: a select count(*) that is not restricted to within a single partition is 
a very bad option. Don’t do that)

The query for my table below is simply:
select user_count [, other columns] from users_by_day where date = ? and hour = 
? and minute = ?


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 

RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Reid is right. You build the tables to easily answer the queries you want. So, 
start with the query! I inferred a query for you based on what you mentioned. 
If my inference is wrong, the table structure is likely wrong, too.

So, what kind of query do you want to run?

(NOTE: a select count(*) that is not restricted to within a single partition is 
a very bad option. Don’t do that)

The query for my table below is simply:
select user_count [, other columns] from users_by_day where date = ? and hour = 
? and minute = ?


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is 

RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
From reports on this mailing list, I do not allow materialized views.


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or 

Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Reid Pinchback
Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org" 
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages 

Re: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Abdul Patel
this is the schema similar to what we have , they want to get user
connected  - concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go
for materialized views ?

CREATE TABLE  usr_session (

userid bigint,

session_usr text,

last_access_time timestamp,

login_time timestamp,

status int,

PRIMARY KEY (userid, session_usr)

) WITH CLUSTERING ORDER BY (session_usr ASC)



On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
wrote:

> Do you only need the current count or do you want to keep the historical
> counts also? By active users, does that mean some kind of user that the
> application tracks (as opposed to the Cassandra user connected to the
> cluster)?
>
>
>
> I would consider a table like this for tracking active users through time:
>
>
>
> Create table users_by_day (
>
> app_date date,
>
> hour integer,
>
> minute integer,
>
> user_count integer,
>
> longest_login_user text,
>
> longest_login_seconds integer,
>
> last_login datetime,
>
> last_login_user text )
>
> primary key (app_date, hour, minute);
>
>
>
> Then, your reporting can easily select full days or a specific, one-minute
> slice. Of course, the app would need to have a timer and write out the
> data. I would also suggest a TTL on the data so that you only keep what you
> need (a week, a year, whatever). Of course, if your reporting requires
> different granularities, you could consider a different time bucket for the
> table (by hour, by week, etc.)
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Abdul Patel 
> *Sent:* Thursday, February 6, 2020 1:54 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Running select against cassandra
>
>
>
> Its sort of user connected, app team needa number of active users
> connected say  every 1 to 5 mins.
>
> The timeout at app end is 120ms.
>
>
>
>
>
> On Thursday, February 6, 2020, Michael Shuler 
> wrote:
>
> You'll have to be more specific. What is your table schema and what is the
> SELECT query? What is the normal response time?
>
> As a basic guide for your general question, if the query is something sort
> of irrelevant that should be stored some other way, like a total row count,
> or most any SELECT that requires ALLOW FILTERING, you're doing it wrong and
> should re-evaluate your data model.
>
> 1 query per minute is a minuscule fraction of the basic capacity of
> queries per minute that a Cassandra cluster should be able to handle with
> good data modeling and table-relevant query. All depends on the data model
> and query.
>
> Michael
>
> On 2/6/20 12:20 PM, Abdul Patel wrote:
>
> Hi,
>
> Is it advisable to run select query to fetch every minute to grab data
> from cassandra for reporting purpose, if no then whats the alternative?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: sstableloader: How much does it actually need?

2020-02-06 Thread Voytek Jarnot
Been thinking about it, and I can't really see how with 4 nodes and RF=3,
any 2 nodes would *not* have all the data; but am more than willing to
learn.

On the other thing: that's an attractive option, but in our case, the
target cluster will likely come into use before the source-cluster data is
available to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>


RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel 
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?


-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Running select against cassandra

2020-02-06 Thread Abdul Patel
Also is materialized view good for production?
We are on 3.11.4

On Thursday, February 6, 2020, Abdul Patel  wrote:

> Its sort of user connected, app team needa number of active users
> connected say  every 1 to 5 mins.
> The timeout at app end is 120ms.
>
>
>
> On Thursday, February 6, 2020, Michael Shuler 
> wrote:
>
>> You'll have to be more specific. What is your table schema and what is
>> the SELECT query? What is the normal response time?
>>
>> As a basic guide for your general question, if the query is something
>> sort of irrelevant that should be stored some other way, like a total row
>> count, or most any SELECT that requires ALLOW FILTERING, you're doing it
>> wrong and should re-evaluate your data model.
>>
>> 1 query per minute is a minuscule fraction of the basic capacity of
>> queries per minute that a Cassandra cluster should be able to handle with
>> good data modeling and table-relevant query. All depends on the data model
>> and query.
>>
>> Michael
>>
>> On 2/6/20 12:20 PM, Abdul Patel wrote:
>>
>>> Hi,
>>>
>>> Is it advisable to run select query to fetch every minute to grab data
>>> from cassandra for reporting purpose, if no then whats the alternative?
>>>
>>>
>>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>


Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Ankit Gadhiya
Hi Michael,

Thanks for your response.

I didn’t copy tokens since it’s an identical cluster and we have RF as 3 on
3 node cluster. Is it still needed , why?

Don’t see anything in cassandra log as such. I don’t have debugs enabled.


Thanks & Regards,
Ankit

On Thu, Feb 6, 2020 at 1:47 PM Michael Shuler 
wrote:

> Did you copy the tokens from cluster1 to new cluster2? Same Cassandra
> version, same instance type/size? What to the logs say on cluster2 that
> look different from the cluster1 norm? There are a number of possible
> `nodetool` utilities that may help see what is happening on new cluster2.
>
> Michael
>
> On 2/6/20 8:09 AM, Ankit Gadhiya wrote:
> > Hi Folks,
> >
> > I recently migrated Cassandra keyspace data from one Azure cluster (3
> > Nodes) to another (3 nodes different region) using simple sstable copy.
> > Post this , we are observing overall response time has increased and
> > timeouts every 20 mins.
> >
> > Has anyone faced such in their experiences ?
> > Do I need to copy anything from system*
> > Anything wrt statistics/cache ?
> >
> > Your time and responses on this are much appreciated.
> >
> >
> > Thanks & Regards,
> > Ankit
> > --
> > *Thanks & Regards,*
> > *Ankit Gadhiya*
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> --
*Thanks & Regards,*
*Ankit Gadhiya*


Re: Running select against cassandra

2020-02-06 Thread Abdul Patel
Its sort of user connected, app team needa number of active users connected
say  every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
wrote:

> You'll have to be more specific. What is your table schema and what is the
> SELECT query? What is the normal response time?
>
> As a basic guide for your general question, if the query is something sort
> of irrelevant that should be stored some other way, like a total row count,
> or most any SELECT that requires ALLOW FILTERING, you're doing it wrong and
> should re-evaluate your data model.
>
> 1 query per minute is a minuscule fraction of the basic capacity of
> queries per minute that a Cassandra cluster should be able to handle with
> good data modeling and table-relevant query. All depends on the data model
> and query.
>
> Michael
>
> On 2/6/20 12:20 PM, Abdul Patel wrote:
>
>> Hi,
>>
>> Is it advisable to run select query to fetch every minute to grab data
>> from cassandra for reporting purpose, if no then whats the alternative?
>>
>>
>>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Query timeouts after Cassandra Migration

2020-02-06 Thread Michael Shuler
Did you copy the tokens from cluster1 to new cluster2? Same Cassandra 
version, same instance type/size? What to the logs say on cluster2 that 
look different from the cluster1 norm? There are a number of possible 
`nodetool` utilities that may help see what is happening on new cluster2.


Michael

On 2/6/20 8:09 AM, Ankit Gadhiya wrote:

Hi Folks,

I recently migrated Cassandra keyspace data from one Azure cluster (3 
Nodes) to another (3 nodes different region) using simple sstable copy. 
Post this , we are observing overall response time has increased and 
timeouts every 20 mins.


Has anyone faced such in their experiences ?
Do I need to copy anything from system*
Anything wrt statistics/cache ?

Your time and responses on this are much appreciated.


Thanks & Regards,
Ankit
--
*Thanks & Regards,*
*Ankit Gadhiya*



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Running select against cassandra

2020-02-06 Thread Michael Shuler
You'll have to be more specific. What is your table schema and what is 
the SELECT query? What is the normal response time?


As a basic guide for your general question, if the query is something 
sort of irrelevant that should be stored some other way, like a total 
row count, or most any SELECT that requires ALLOW FILTERING, you're 
doing it wrong and should re-evaluate your data model.


1 query per minute is a minuscule fraction of the basic capacity of 
queries per minute that a Cassandra cluster should be able to handle 
with good data modeling and table-relevant query. All depends on the 
data model and query.


Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:

Hi,

Is it advisable to run select query to fetch every minute to grab data 
from cassandra for reporting purpose, if no then whats the alternative?





-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Running select against cassandra

2020-02-06 Thread Abdul Patel
Hi,

Is it advisable to run select query to fetch every minute to grab data from
cassandra for reporting purpose, if no then whats the alternative?


Re: Nodes becoming unresponsive

2020-02-06 Thread Elliott Sims
Async-profiler (https://github.com/jvm-profiling-tools/async-profiler )
flamegraphs can also be a really good tool to figure out the exact
callgraph that's leading to the futex_wait, both in and out of the JVM.


Query timeouts after Cassandra Migration

2020-02-06 Thread Ankit Gadhiya
Hi Folks,

I recently migrated Cassandra keyspace data from one Azure cluster (3
Nodes) to another (3 nodes different region) using simple sstable copy.
Post this , we are observing overall response time has increased and
timeouts every 20 mins.

Has anyone faced such in their experiences ?
Do I need to copy anything from system*
Anything wrt statistics/cache ?

Your time and responses on this are much appreciated.


Thanks & Regards,
Ankit
-- 
*Thanks & Regards,*
*Ankit Gadhiya*