Re: where does c* store the schema?

2018-04-17 Thread Blake Eggleston
Rahul, none of that is true at all. 

 

Each node stores schema locally in a non-replicated system table. Schema 
changes are disseminated directly to live nodes (not the write path), and the 
schema version is gossiped to other nodes. If a node misses a schema update, it 
will figure this out when it notices that it’s local schema version is behind 
the one being gossiped by the rest of the cluster, and will pull the updated 
schema from the other nodes in the cluster.

 

From: Rahul Singh 
Reply-To: 
Date: Tuesday, April 17, 2018 at 4:13 PM
To: 
Subject: Re: where does c* store the schema?

 

It uses a “everywhere” replication strategy and its recommended to do all alter 
/ create / drop statements with consistency level all — meaning it wouldn’t 
make the change to the schema if the nodes are up.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation


On Apr 17, 2018, 12:31 AM -0500, Jinhua Luo , wrote:


Yes, I know it must be in system schema.

But how c* replicates the user defined schema to all nodes? If it
applies the same RWN model to them, then what's the R and W?
And when a failed node comes back to the cluster, how to recover the
schema updates it may miss during the outage?

2018-04-16 17:01 GMT+08:00 DuyHai Doan :


There is a system_schema keyspace to store all the schema information

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v

On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo  wrote:



Hi All,

Does c* use predefined keyspace/tables to store the user defined schema?
If so, what's the RWN of those meta schema? And what's the procedure
to update them?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Why Cassandra need full repair after incremental repair

2017-11-02 Thread Blake Eggleston
Because in theory, corruption of your repaired dataset is possible, which 
incremental repair won’t fix. 

In practice pre-4.0 incremental repair has some flaws that can bring deleted 
data back to life in some cases, which this would address. 

You should also evaluate whether pre-4.0 incremental repair is saving you time. 
The same flaws can cause *a lot* of over streaming, which may negate the 
benefit of repairing only the unrepaired data.

> On Nov 2, 2017, at 2:17 AM, dayu  wrote:
> 
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesWhen.html
> 
> So you means i am misleading by this statements. The full repair only needed 
> when node failure + replacement, or adding a datacenter. right?
> 
> 
> 
> 
> At 2017-11-02 15:54:49, "kurt greaves"  wrote:
> Where are you seeing this? If your incremental repairs work properly, full 
> repair is only needed in certain situations, like after node failure + 
> replacement, or adding a datacenter.
> 
> 
>  


Re: Cqlsh unable to switch keyspace after Cassandra upgrade.

2017-11-02 Thread Blake Eggleston
Looks like a bug, could you open a jira?

> On Nov 2, 2017, at 2:08 AM, Mikhail Tsaplin  wrote:
> 
> Hi,
> I've upgraded Cassandra from 2.1.6 to 3.0.9 on three nodes cluster. After 
> upgrade 
> cqlsh shows following error when trying to run "use {keyspace};" command:
> 'ResponseFuture' object has no attribute 'is_schema_agreed'
> 
> Actual upgrade was done on Ubuntu 16.04 by running "apt-get upgrade 
> cassandra" command.
> Apt repository is deb http://debian.datastax.com/community stable main.
> Following parameters were migrated from former cassandra.yaml:
> cluster_name, num_tokens, data_file_directories, commit_log_directory, 
> saved_caches_directory, seeds, listen_address, rpc_address, initial_token, 
> auto_bootstrap.
> 
> Later I did additional test - fetched 3.0.15 binary distribution from 
> cassandra.apache.org and tried to run cassandra from this distr - same error:
> $ ./bin/cqlsh
> Connected to cellwize.cassandra at 172.31.17.42:9042.
> [cqlsh 5.0.1 | Cassandra 3.0.15 | CQL spec 3.4.0 | Native protocol v4]
> Use HELP for help.
> cqlsh> use listener ;
> 'ResponseFuture' object has no attribute 'is_schema_agreed'
> cqlsh> 
> 
> What could be the reason?


Re: Need help with incremental repair

2017-10-30 Thread Blake Eggleston
Ah cool, I didn't realize reaper did that.

On October 30, 2017 at 1:29:26 PM, Paulo Motta (pauloricard...@gmail.com) wrote:

> This is also the case for full repairs, if I'm not mistaken. Assuming I'm not 
> missing something here, that should mean that he shouldn't need to mark 
> sstables as unrepaired? 

That's right, but he mentioned that he is using reaper which uses 
subrange repair if I'm not mistaken, which doesn't do anti-compaction. 
So in that case he should probably mark data as unrepaired when no 
longer using incremental repair. 

2017-10-31 3:52 GMT+11:00 Blake Eggleston <beggles...@apple.com>: 
>> Once you run incremental repair, your data is permanently marked as 
>> repaired 
> 
> This is also the case for full repairs, if I'm not mistaken. I'll admit I'm 
> not as familiar with the quirks of repair in 2.2, but prior to 
> 4.0/CASSANDRA-9143, any global repair ends with an anticompaction that marks 
> sstables as repaired. Looking at the RepairRunnable class, this does seem to 
> be the case. Assuming I'm not missing something here, that should mean that 
> he shouldn't need to mark sstables as unrepaired? 

- 
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
For additional commands, e-mail: user-h...@cassandra.apache.org 



Re: Need help with incremental repair

2017-10-30 Thread Blake Eggleston
> Once you run incremental repair, your data is permanently marked as repaired

This is also the case for full repairs, if I'm not mistaken. I'll admit I'm not 
as familiar with the quirks of repair in 2.2, but prior to 4.0/CASSANDRA-9143, 
any global repair ends with an anticompaction that marks sstables as repaired. 
Looking at the RepairRunnable class, this does seem to be the case. Assuming 
I'm not missing something here, that should mean that he shouldn't need to mark 
sstables as unrepaired?


Re: Need help with incremental repair

2017-10-28 Thread Blake Eggleston
Hey Aiman,

Assuming the situation is just "we accidentally ran incremental repair", you 
shouldn't have to do anything. It's not going to hurt anything. Pre-4.0 
incremental repair has some issues that can cause a lot of extra streaming, and 
inconsistencies in some edge cases, but as long as you're running full repairs 
before gc grace expires, everything should be ok.

Thanks,

Blake


On October 28, 2017 at 1:28:42 AM, Aiman Parvaiz (ai...@steelhouse.com) wrote:

Hi everyone,

We seek your help in a issue we are facing in our 2.2.8 version.

We have 24 nodes cluster spread over 3 DCs.

Initially, when the cluster was in a single DC we were using The Last Pickle 
reaper 0.5 to repair it with incremental repair set to false. We added 2 more 
DCs. Now the problem is that accidentally on one of the newer DCs we ran 
nodetool repair  without realizing that for 2.2 the default option is 
incremental. 

I am not seeing any errors in the logs till now but wanted to know what would 
be the best way to handle this situation. To make things a little more 
complicated, the node on which we triggered this repair is almost out of disk 
and we had to restart C* on it.

I can see a bunch of "anticompaction after repair" under Opscenter Activites 
across various nodes in the 3 DCs.



Any help, suggestion would be appreciated.

Thanks




Materialized Views marked experimental

2017-10-26 Thread Blake Eggleston
Hi user@,

Following a discussion on dev@, the materialized view feature is being 
retroactively classified as experimental, and not recommended for new 
production uses. The next patch releases of 3.0, 3.11, and 4.0 will include 
CASSANDRA-13959, which will log warnings when materialized views are created, 
and introduce a yaml setting that will allow operators to disable their 
creation.

Concerns about MV’s suitability for production are not uncommon, and this just 
formalizes the advice often given to people considering materialized views. 
That is: materialized views have shortcomings that can make them unsuitable for 
the general use case. If you’re not familiar with their shortcomings and 
confident they won’t cause problems for your use case, you shouldn’t use them

The shortcomings I’m referring to are:
* There's no way to determine if a view is out of sync with the base table.
* If you do determine that a view is out of sync, the only way to fix it is to 
drop and rebuild the view.
Even in the happy path, there isn’t an upper bound on how long it will take for 
updates to be reflected in the view.

There is also concern that the materialized view design isn’t known to be 
‘correct’. In other words, it’s a distributed system design that hasn’t been 
extensively modeled and simulated, and there have been no formal proofs about 
it’s properties.

You should be aware that manually denormalizing your tables has these same 
limitations in most cases, so you may not be losing any guarantees by using 
materialized views in your use case. The reason we’re doing this is because 
users expect correctness from features built into a database. It’s not 
unreasonable for users to think that a database feature will more or less “just 
work”, which is not necessarily the case for materialized views. If a feature 
is considered experimental, users are more likely to spend the time 
understanding it’s shortcomings before using it in their application.

Thanks,

Blake

The dev@ thread can be found here: 
https://www.mail-archive.com/dev@cassandra.apache.org/msg11511.html




Re: What is a node's "counter ID?"

2017-10-20 Thread Blake Eggleston
I believe that’s just referencing a counter implementation detail. If I 
remember correctly, there was a fairly large improvement of the implementation 
of counters in 2.1, and the assignment of the id would basically be a format 
migration.

> On Oct 20, 2017, at 9:57 AM, Paul Pollack  wrote:
> 
> Hi,
> 
> I was reading the doc page for nodetool cleanup 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html 
> because I was planning to run it after replacing a node in my counter cluster 
> and the sentence "Cassandra assigns a new counter ID to the node" gave me 
> pause. I can't find any other reference to a node's counter ID in the docs 
> and was wondering if anyone here could shed light on what this means, and how 
> it would affect the data being stored on a node that had its counter ID 
> changed?
> 
> Thanks,
> Paul


Re: Does NTP affects LWT's ballot UUID?

2017-10-11 Thread Blake Eggleston
Since the UUID is used as the ballot in a paxos instance, if it goes backwards 
in time, it will be rejected by the other replicas (if there is a more recent 
instance), and the proposal will fail. However, after the initial rejection, 
the coordinator will try again with the most recently seen ballot +1, which 
should succeed (unless another coordinator has started a proposal with a higher 
ballot in the meantime).

On October 10, 2017 at 1:04:22 AM, Daniel Woo (daniel.y@gmail.com) wrote:

Hi DuyHai,

Thanks, and that's exactly what I am asking, if NTP goes backward. Actually NTP 
often does that because clock drift is inevitable.

On Tue, Oct 10, 2017 at 3:13 PM, DuyHai Doan  wrote:
The ballot UUID is obtained using QUORUM agreement between replicas for a given 
partition key and we use this TimeUUID ballot as write-time for the mutation.

The only scenario where I can see a problem is that NTP goes backward in time 
on a QUORUM of replicas, which would break the contract of monotonicity. I 
don't know how likely this event is ...

On Tue, Oct 10, 2017 at 9:07 AM, Daniel Woo  wrote:
Hi guys,

The ballot UUID should be monotonically increasing on each coordinator, but the 
UUID in cassandra is version 1 (timestamp based), what happens if the NTP 
service adjusts system clock while a two phase paxos prepare/commit is in 
progress?

--
Thanks & Regards,
Daniel




--
Thanks & Regards,
Daniel


Re: table repair question

2017-10-04 Thread Blake Eggleston
Incremental repairs should also update the percentage, although I'd recommend 
not using incremental repair before 4.0. Just want to point out that running 
repairs based on repaired % isn't necessarily a bad thing, but it should be a 
secondary consideration. The important thing is to repair data more frequently 
than your gc grace period.


On October 4, 2017 at 1:33:57 PM, Javier Canillas (javier.canil...@gmail.com) 
wrote:

That percentage will only be updated if you do a full repair. If you do repairs 
on local dc or with -pr, that percentage will not be updated.

I scripted a regular repair on each node based on if this percentage is below 
some threshold. It has been running fine since several months ago.

2017-10-04 12:46 GMT-03:00 Blake Eggleston <beggles...@apple.com>:
Not really no. There's a repaired % in nodetool tablestats if you're using 
incremental repair (and you probably shouldn't be before 4.0 comes out), but I 
wouldn't make any decisions based off it's value.


On October 4, 2017 at 8:05:44 AM, ZAIDI, ASAD A (az1...@att.com) wrote:

Hello folk,

 

I’m wondering if there is way to find out list of table(s) which need repair OR 
if there Is way to find out what %age of data would need to be repaired on a 
table?  Is such information available from Cassandra  db engine through some 
other means?

 

TIA~ Asad

 

 

 




Re: table repair question

2017-10-04 Thread Blake Eggleston
Not really no. There's a repaired % in nodetool tablestats if you're using 
incremental repair (and you probably shouldn't be before 4.0 comes out), but I 
wouldn't make any decisions based off it's value.


On October 4, 2017 at 8:05:44 AM, ZAIDI, ASAD A (az1...@att.com) wrote:

Hello folk,

 

I’m wondering if there is way to find out list of table(s) which need repair OR 
if there Is way to find out what %age of data would need to be repaired on a 
table?  Is such information available from Cassandra  db engine through some 
other means?

 

TIA~ Asad

 

 

 

Re: Materialized views stability

2017-10-02 Thread Blake Eggleston
Hi Hannu,

There are more than a few committers that don't think MVs are currently 
suitable for production use. I'm not involved with MV development, so this may 
not be 100% accurate, but the problems as I understand them are: 

There's no way to determine if a view is out of sync with the base table.
If you do determine that a view is out of sync, the only way to fix it is to 
drop and rebuild the view.
There are liveness issues with updates being reflected in the view. 

Any one of these issues makes it difficult to recommend for general application 
development. I'd say that if you're not super familiar with their shortcomings 
and are confident you can fit your use case in them, you're probably better off 
not using them.

Thanks,

Blake

On October 2, 2017 at 6:55:52 AM, Hannu Kröger (hkro...@gmail.com) wrote:

Hello,

I have seen some discussions around Materialized Views and stability of that 
functionality.

There are some open issues around repairs:
https://issues.apache.org/jira/browse/CASSANDRA-13810?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20Testing%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20component%20%3D%20%22Materialized%20Views%22

Is it so that the current problems are mostly related to incremental repairs or 
are there also other major concerns why some people don’t consider them to be 
safe for production use?

Cheers,
Hannu



Re: Nodetool repair -pr

2017-09-29 Thread Blake Eggleston
It will on 2.2 and higher, yes.

Also, just want to point out that it would be worth it for you to compare how 
long incremental repairs take vs full repairs in your cluster. There are some 
problems (which are fixed in 4.0) that can cause significant overstreaming when 
using incremental repair.

On September 28, 2017 at 11:46:47 AM, Dmitry Buzolin (dbuz5ga...@gmail.com) 
wrote:

Hi All, 

Can someone confirm if 

"nodetool repair -pr -j2" does run with -inc too? I see the docs mention -inc 
is set by default, but I am not sure if it is enabled when -pr option is used. 

Thanks! 
- 
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
For additional commands, e-mail: user-h...@cassandra.apache.org 



Re: How to check if repair is actually successful

2017-09-01 Thread Blake Eggleston
If nodetool repair doesn't return an error, and doesn't hang, the repair 
completed successfully.

On September 1, 2017 at 5:50:53 AM, Akshit Jain (akshit13...@iiitd.ac.in) wrote:

Hi,
I am performing repair on cassandra cluster.
After getting repair status as successful, How to figure out if it is 
successful actually?
Is there any way to test it?


Re: nodetool gossipinfo question

2017-08-31 Thread Blake Eggleston
That's the value version. Gossip uses versioned values to work out which piece 
of data is the most recent. Each node has it's own highest version, so I don't 
think it's unusual for that to be different for different nodes. When you say 
the node crashes, do you mean the process dies?

On August 29, 2017 at 4:20:30 PM, Gopal, Dhruva (dhruva.go...@aspect.com) wrote:

Hi –

  I have a question on the significance a particular attribute in the output 
result for ‘nodetool gossipinfo’: RELEASE_VERSION. We have a node that is 
periodically crashing with nothing really of significance in the logs and we’re 
trying to ascertain if it’s an OS issue or something to do with Cassandra. The 
output of ‘nodetool gossipinfo’ for this node was different from all other 
nodes in the cluster – we’re using 2.1.11 in this cluster and it was set to 
RELEASE_VERSION:6:2.1.11. In a subsequent run, after another crash, while 
attempting a repair it was set to RELEASE_VERSION:7:2.1.11. All other nodes 
output RELEASE_VERSION:4:2.1.11. What is the significance of that first digit 
(the rest appears to be the version of Cassandra running on each node).

 

 

Regards,

Dhruva

 

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Re: Question about nodetool repair

2017-08-31 Thread Blake Eggleston
Specifying a dc will only repair the data in that dc. If you leave out the dc 
flag, it will repair data in both dcs. You probably shouldn't be restricting 
repair to one dc without a good rationale for doing so.

On August 31, 2017 at 8:56:24 AM, Harper, Paul (paul.har...@aspect.com) wrote:

Hello All,

 

I have a 6 node ring with 3 nodes in DC1 and 3 nodes in DC2. I ssh into node5 
on DC2 was in a “DN” state. I ran “nodetool repair”. I’ve had this situation 
before and ran “nodetool repair -dc DC2”.  I’m trying what if anything is 
different between those commands. What are they actually doing?

 

Thanks

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Re: Missing results during range query paging

2017-05-18 Thread Blake Eggleston
That does sound troubling. You mentioned you're reading at local quorum. Did 
you write these control records at quorum, or from the same dc at local quorum? 
What CL/DC are the other records written at?

On May 17, 2017 at 10:16:42 AM, Dominic Chevalier (dccheval...@gmail.com) wrote:

Hi Folks, 

I've been noticing some missing rows, any where from 20-40% missing, while 
executing paging queries over my cluster. 

Basically the query is to hit every row, subdividing the entire token range 
into a few tens of token ranges to parallelize the work, there is no wrap 
around involved, at local_quorum:

select * from cf where token(primaryKey) > minimum and token(primaryKey) < 
maximum; 

I have inserted a test-control data set of 100,000 records, among billions of 
live records. The control data set does not change, does not TTL, and queries 
for individual rows at local_quorum return nearly all of the data, so it's very 
strange paging queries consistently return 60-80% of what I expect. In the 
past, paging queries have returned almost all of the control data set, and 
still do in smaller test clusters. 

My suspicion is something in the cluster state is impacting these results, but 
I have yet to pin point anything. Nor have I been able to pinpoint what in the 
past lead from consistently 100% paging coverage to consistently a lot less 
than 100% coverage.

My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the 
local data center. Java driver version 3.1.0.

Thank you,
Dominic



Re: LCS, range tombstones, and eviction

2017-05-12 Thread Blake Eggleston
The start and end points of a range tombstone are basically stored as special 
purpose rows alongside the normal data in an sstable. As part of a read, 
they're reconciled with the data from the other sstables into a single 
partition, just like the other rows. The only difference is that they don't 
contain any 'real' data, and, of course, they prevent 'deleted' data from being 
returned in the read. It's a bit more complicated than that, but that's the 
general idea.


On May 12, 2017 at 6:23:01 AM, Stefano Ortolani (ostef...@gmail.com) wrote:

Thanks a lot Blake, that definitely helps!

I actually found a ticket re range tombstones and how they are accounted for: 
https://issues.apache.org/jira/browse/CASSANDRA-8527

I am wondering now what happens when a node receives a read request. Are the 
range tombstones read before scanning the SStables? More interestingly, given 
that a single partition might be split across different levels, and that some 
range tombstones might be in L0 while all the rest of the data in L1, are all 
the tombstones prefetched from _all_ the involved SStables before doing any 
table scan?

Regards,
Stefano

On Thu, May 11, 2017 at 7:58 PM, Blake Eggleston <beggles...@apple.com> wrote:
Hi Stefano,

Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.

If there are no other normal compaction tasks to be run, LCS will attempt to 
compact the sstables it estimates it will be able to drop the most tombstones 
from. It does this by estimating the number of tombstones an sstable has that 
have passed the gc grace period. Whether or not a tombstone will actually be 
evicted is more complicated. Even if a tombstone has passed gc grace, it can't 
be dropped if the data it's deleting still exists in another sstable, otherwise 
the data would appear to return. So, a tombstone won't be dropped if there is 
data for the same partition in other sstables that is older than the tombstone 
being evaluated for eviction.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

From what I can tell, each end of the range tombstone is counted as a single 
tombstone tombstone. So a range tombstone effectively contributes '2' to the 
count of tombstones for an sstable. I'm not 100% sure, but I haven't seen any 
sstable writing logic that tracks open tombstones and counts covered cells as 
tombstones. So, it's likely that the effect of range tombstones covering many 
rows are under represented in the droppable tombstone estimate.

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job continously anticompacting the data and 
thus creating SStables, what is the likelhood of the default interval 
(10 days) to be actually hit?

It will be hit, but probably only in the repaired data. Once the data is marked 
repaired, it shouldn't be anticompacted again, and should get old enough to 
pass the compaction interval. That shouldn't be an issue though, because you 
should be running repair often enough that data is repaired before it can ever 
get past the gc grace period. Otherwise you'll have other problems. Also, keep 
in mind that tombstone eviction is a part of all compactions, it's just that 
occasionally a compaction is run specifically for that purpose. Finally, you 
probably shouldn't run incremental repair on data that is deleted. There is a 
design flaw in the incremental repair used in pre-4.0 of cassandra that can 
cause consistency issues. It can also cause a *lot* of over streaming, so you 
might want to take a look at how much streaming your cluster is doing with full 
repairs, and incremental repairs. It might actually be more efficient to run 
full repairs.

Hope that helps,

Blake

On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostef...@gmail.com) wrote:

Hi all,

I am trying to wrap my head around how C* evicts tombstones when using LCS.
Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job con

Re: AWS Cassandra backup/Restore tools

2017-05-11 Thread Blake Eggleston
OpsCenter 6.0 and up don't work with Cassandra.

On May 11, 2017 at 12:31:08 PM, cass savy (casss...@gmail.com) wrote:

AWS Backup/Restore process/tools for C*/DSE C*:

Has anyone used Opscenter 6.1 backup tool to backup/restore data for larger 
datasets online ?

If yes, did you run into issues using that tool to backup/restore data in PROD 
that caused any performance or any other impact to the cluster?

If no, what are other tools that people have used or recommended for backup and 
restore of Cassandra keyspaces?

Please advice.




Re: LCS, range tombstones, and eviction

2017-05-11 Thread Blake Eggleston
Hi Stefano,

Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.

If there are no other normal compaction tasks to be run, LCS will attempt to 
compact the sstables it estimates it will be able to drop the most tombstones 
from. It does this by estimating the number of tombstones an sstable has that 
have passed the gc grace period. Whether or not a tombstone will actually be 
evicted is more complicated. Even if a tombstone has passed gc grace, it can't 
be dropped if the data it's deleting still exists in another sstable, otherwise 
the data would appear to return. So, a tombstone won't be dropped if there is 
data for the same partition in other sstables that is older than the tombstone 
being evaluated for eviction.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

From what I can tell, each end of the range tombstone is counted as a single 
tombstone tombstone. So a range tombstone effectively contributes '2' to the 
count of tombstones for an sstable. I'm not 100% sure, but I haven't seen any 
sstable writing logic that tracks open tombstones and counts covered cells as 
tombstones. So, it's likely that the effect of range tombstones covering many 
rows are under represented in the droppable tombstone estimate.

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job continously anticompacting the data and 
thus creating SStables, what is the likelhood of the default interval 
(10 days) to be actually hit?

It will be hit, but probably only in the repaired data. Once the data is marked 
repaired, it shouldn't be anticompacted again, and should get old enough to 
pass the compaction interval. That shouldn't be an issue though, because you 
should be running repair often enough that data is repaired before it can ever 
get past the gc grace period. Otherwise you'll have other problems. Also, keep 
in mind that tombstone eviction is a part of all compactions, it's just that 
occasionally a compaction is run specifically for that purpose. Finally, you 
probably shouldn't run incremental repair on data that is deleted. There is a 
design flaw in the incremental repair used in pre-4.0 of cassandra that can 
cause consistency issues. It can also cause a *lot* of over streaming, so you 
might want to take a look at how much streaming your cluster is doing with full 
repairs, and incremental repairs. It might actually be more efficient to run 
full repairs.

Hope that helps,

Blake

On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostef...@gmail.com) wrote:

Hi all,

I am trying to wrap my head around how C* evicts tombstones when using LCS.
Based on what I understood reading the docs, if the ratio of garbage 
collectable tomstones exceeds the "tombstone_threshold", C* should start 
compacting and evicting.

I am quite puzzled however by what might happen when dealing with range 
tombstones. In that case a single tombstone might actually stand for an 
arbitrary number of normal tombstones. In other words, do range tombstones 
contribute to the "tombstone_threshold"? If so, how?

I am also a bit confused by the "tombstone_compaction_interval". If I am 
dealing with a big partition in LCS which is receiving new records every day, 
and a weekly incremental repair job continously anticompacting the data and 
thus creating SStables, what is the likelhood of the default interval 
(10 days) to be actually hit?

Hopefully somebody will be able to shed some lights here!

Thanks in advance! 
Stefano 



Re: massive spikes in read latency

2014-01-06 Thread Blake Eggleston
That’s a good point. CPU steal time is very low, but I haven’t observed 
internode ping times during one of the peaks, I’ll have to check that out. 
Another thing I’ve noticed is that cassandra starts dropping read messages 
during the spikes, as reported by tpstats. This indicates that there’s too many 
queries for cassandra to handle. However, as I mentioned earlier, the spikes 
aren’t correlated to an increase in reads.

On Jan 5, 2014, at 3:28 PM, Blake Eggleston bl...@shift.com wrote:

 Hi,
 
 I’ve been having a problem with 3 neighboring nodes in our cluster having 
 their read latencies jump up to 9000ms - 18000ms for a few minutes (as 
 reported by opscenter), then come back down.
 
 We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with cassandra 
 reading and writing to 2 raided ssds.
 
 I’ve added 2 nodes to the struggling part of the cluster, and aside from the 
 latency spikes shifting onto the new nodes, it has had no effect. I suspect 
 that a single key that lives on the first stressed node may be being read 
 from heavily.
 
 The spikes in latency don’t seem to be correlated to an increase in reads. 
 The cluster’s workload is usually handling a maximum workload of 4200 
 reads/sec per node, with writes being significantly less, at ~200/sec per 
 node. Usually it will be fine with this, with read latencies at around 3.5-10 
 ms/read, but once or twice an hour the latencies on the 3 nodes will shoot 
 through the roof. 
 
 The disks aren’t showing serious use, with read and write rates on the ssd 
 volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra 
 process is maintaining 1000-1100 open connections. GC logs aren’t showing any 
 serious gc pauses.
 
 Any ideas on what might be causing this?
 
 Thanks,
 
 Blake



massive spikes in read latency

2014-01-05 Thread Blake Eggleston
Hi,

I’ve been having a problem with 3 neighboring nodes in our cluster having their 
read latencies jump up to 9000ms - 18000ms for a few minutes (as reported by 
opscenter), then come back down.

We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with cassandra 
reading and writing to 2 raided ssds.

I’ve added 2 nodes to the struggling part of the cluster, and aside from the 
latency spikes shifting onto the new nodes, it has had no effect. I suspect 
that a single key that lives on the first stressed node may be being read from 
heavily.

The spikes in latency don’t seem to be correlated to an increase in reads. The 
cluster’s workload is usually handling a maximum workload of 4200 reads/sec per 
node, with writes being significantly less, at ~200/sec per node. Usually it 
will be fine with this, with read latencies at around 3.5-10 ms/read, but once 
or twice an hour the latencies on the 3 nodes will shoot through the roof. 

The disks aren’t showing serious use, with read and write rates on the ssd 
volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra process 
is maintaining 1000-1100 open connections. GC logs aren’t showing any serious 
gc pauses.

Any ideas on what might be causing this?

Thanks,

Blake

Re: get all row keys of a table using CQL3

2013-07-23 Thread Blake Eggleston
Hi Jimmy,

Check out the token function:

http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results

You can use it to page through your rows.

Blake


On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:

 hi,
 I want to fetch all the row keys of a table using CQL3:
  
 e.g
 select id from mytable limit 999
  
  
 #1
 For this query, does the node need to wait for all rows return from all other 
 nodes before returning the data to the client(I am using astyanax) ?
 In other words, will this operation create a lot of load to the initial node 
 receiving the request?
  
  
 #2
 if my table is big, I have to make sure the limit is set to a big enough 
 number, such that I can get all the result. Seems like I have to do a 
 count(*) to be sure
 is there any alternative(always return all the rows)?
  
 #3
 if my id is a timeuuid, is it better to  combine the result from couple of 
 the following cql to obtain all keys?
 e.g
 select id from mytable where id t  minTimeuuid('2013-02-02 10:00+') 
 limit 2
 +
 select id from mytable where id t  maxTimeuuid('2013-02-02 10:00+') 
 limit 2
  
 thanks
 
  
  
  



columns disappearing intermittently

2013-07-03 Thread Blake Eggleston
Hi All,

We're having a problem with our cassandra cluster and are at a loss as to the 
cause.

We have what appear to be columns that disappear for a little while, then 
reappear. The rest of the row is returned normally during this time. This is, 
of course, very disturbing, and is wreaking havoc with our application.

A bit more info about what's happening:

We are repeatedly executing the same query against our cluster. Every so often, 
one of the columns will disappear from the row and will remain gone for some 
time. Then after continually executing the same query, the column will come 
back. The queries are being executed against a 3 node cluster, with a 
replication factor of 3, and all reads and writes are done with a quorum 
consistency level.

We upgraded from cassandra 1.1.12 to 1.2.6 last week, but only started seeing 
issues this morning.

Has anyone had a problem like this before, or have any idea what might be 
causing it?

Re: How to query secondary indexes

2012-11-28 Thread Blake Eggleston
You're going to have a problem doing this in a single query because you're
asking cassandra to select a non-contiguous set of rows. Also, to my
knowledge, you can only use non equal operators on clustering keys. The
best solution I could come up with would be to define you table like so:

CREATE TABLE room_data (
room_id uuid,
in_room int,
temp float,
time timestamp,
PRIMARY KEY (room_id, in_room, temp));

Then run 2 queries:
SELECT * FROM room_data WHERE in_room  7;
SELECT * FROM room_data WHERE temp  50.0;

And do an intersection on the results.

I should add the disclaimer that I am relatively new to CQL, so there may
be a better way to do this.

Blake


On Wed, Nov 28, 2012 at 10:02 AM, Oren Karmi oka...@gmail.com wrote:

 Hi,

 According to the documentation on Indexes (
 http://www.datastax.com/docs/1.1/ddl/indexes ),
 in order to use WHERE on a column which is not part of my key, I must
 define a secondary index on it. However, I can only use equality comparison
 on it but I wish to use other comparisons methods like greater than.

 Let's say I have a room with people and every timestamp, I measure
 the temperature of the room and number of people. I use the timestamp as my
 key and I want to select all timestamps where temperature was over 50
 degrees but I can't seem to be able to do it with a regular query even if I
 define that column as a secondary index.
 SELECT * FROM MyTable WHERE temp  50.4571;

 My lame workaround is to define a secondary index on NumOfPeopleInRoom and
 than for a specific value
 SELECT * FROM MyTable WHERE NumOfPeopleInRoom = 7 AND temp  50.4571;

 I'm pretty sure this is not the proper way for me to do this.

 How should I attack this? It feels like I'm missing a very basic concept.
 I'd appreciate it if your answers include also the option of not changing
 my schema.

 Thanks!!!