Re: Convert single node C* to cluster (rebalancing problem)

2017-06-14 Thread Affan Syed
John,

I am a co-worker with Junaid -- he is out sick, so just wanted to confirm
that one of your shots in the dark is correct. This is a RF of 1x

"CREATE KEYSPACE orion WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'}  AND durable_writes = true;"

However, how does the RF affect the redistribution of key/data?

Affan

- Affan

On Wed, Jun 14, 2017 at 1:16 AM, John Hughes  wrote:

> OP, I was just looking at your original numbers and I have some questions:
>
> 270GB on one node and 414KB on the other, but something close to 50/50 on
> "Owns(effective)".
> What replication factor are your keyspaces set up with? 1x or 2x or ??
>
> I would say you are seeing 50/50 because the tokens are allocated
> 50/50(others on the list please correct what are for me really just
> assumptions), but I would hazard a guess that your replication factor
> is still 1x, so it isn't moving anything around. Or your keyspace
> rplication is incorrect and isn't being distributed(I have had issues with
> the AWSMultiRegionSnitch and not getting the region correct[us-east vs
> us-east-1). It doesn't throw an error, but it doesn't work very well either
> =)
>
> Can you do a 'describe keyspace XXX' and show the first line(the CREATE
> KEYSPACE line).
>
> Mind you, these are all just shots in the dark from here.
>
> Cheers,
>
>
> On Tue, Jun 13, 2017 at 3:13 AM Junaid Nasir  wrote:
>
>> Is the OP expecting a perfect 50%/50% split?
>>
>>
>> best result I got was 240gb/30gb split, which I think is not properly
>> balanced.
>>
>>
>>> Also, what are your outputs when you call out specific keyspaces? Do the
>>> numbers get more even?
>>
>>
>> i don't know what you mean by *call out specific key spaces?* can you
>> please explain that a bit.
>>
>>
>> If your schema is not modelled correctly you can easily end up unevenly
>>> distributed data.
>>
>>
>> I think that is the problem. initial 270gb data might not by modeled
>> correctly. I have run a lot of tests on 270gb data including downsizing it
>> to 5gb, they all resulted in same uneven distribution. I also tested a
>> dummy dataset of 2gb which was balanced evenly. coming from rdb, I didn't
>> give much thought to data modeling. can anyone please point me to some
>> resources regarding this problem.
>>
>> On Tue, Jun 13, 2017 at 3:24 AM, Akhil Mehra 
>> wrote:
>>
>>> Great point John.
>>>
>>> The OP should also note that data distribution also depends on your
>>> schema and incoming data profile.
>>>
>>> If your schema is not modelled correctly you can easily end up unevenly
>>> distributed data.
>>>
>>> Cheers,
>>> Akhil
>>>
>>> On Tue, Jun 13, 2017 at 3:36 AM, John Hughes 
>>> wrote:
>>>
 Is the OP expecting a perfect 50%/50% split? That, to my experience, is
 not going to happen, it is almost always shifted from a fraction of a
 percent to a couple percent.

 Datacenter: eu-west
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens   Owns (effective)  Host ID
 Rack
 UN  XX.XX.XX.XX22.71 GiB  256  47.6%
 57dafdde-2f62-467c-a8ff-c91e712f89c9  1c
 UN  XX.XX.XX.XX  17.17 GiB  256  51.3%
 d2a65c51-087d-48de-ae1f-a41142eb148d  1b
 UN  XX.XX.XX.XX  26.15 GiB  256  52.4%
 acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c  1c
 UN  XX.XX.XX.XX   16.64 GiB  256  50.2%
 6c8842dd-a966-467c-a7bc-bd6269ce3e7e  1a
 UN  XX.XX.XX.XX  24.39 GiB  256  49.8%
 fd92525d-edf2-4974-8bc5-a350a8831dfa  1a
 UN  XX.XX.XX.XX   23.8 GiB   256  48.7%
 bdc597c0-718c-4ef6-b3ef-7785110a9923  1b

 Though maybe part of what you are experiencing can be cleared up by
 repair/compaction/cleanup. Also, what are your outputs when you call out
 specific keyspaces? Do the numbers get more even?

 Cheers,

 On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra 
 wrote:

> auto_bootstrap is true by default. Ensure its set to true. On startup
> look at your logs for your auto_bootstrap value.  Look at the node
> configuration line in your log file.
>
> Akhil
>
> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir  wrote:
>
>> No, I didn't set it (left it at default value)
>>
>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A  wrote:
>>
>>> Did you make sure auto_bootstrap property is indeed set to [true]
>>> when you added the node?
>>>
>>>
>>>
>>> *From:* Junaid Nasir [mailto:jna...@an10.io]
>>> *Sent:* Monday, June 05, 2017 6:29 AM
>>> *To:* Akhil Mehra 
>>> *Cc:* Vladimir Yudovin ;
>>> user@cassandra.apache.org
>>> *Subject:* Re: Convert single node C* to cluster (rebalancing
>>> problem)

Re: Apache Cassandra - Memory usage on server

2017-06-14 Thread Thakrar, Jayesh
Asad,

The rest of the 42 GB of memory on your server is used by the filesystem buffer 
cache - see the "cached" column and the -/+ buffers/cache line.

The OS (Linux) uses all free memory for filesystem buffer cache and if 
applications need memory, will relinquish it appropriately.

To see the actual memory footprint of Cassandra, run the command "nodetool -h 
 info"

You will see something like what is show below. Your C* memory usage is the sum 
of heap and off heap memory.

ID : d4eba1d4-95e7-426c-91c6-8ee57bd4d1a7
Gossip active  : true
Thrift active  : false
Native Transport active: true
Load   : 130.63 MiB
Generation No  : 1495544782
Uptime (seconds)   : 583743
Heap Memory (MB)   : 645.81 / 4016.00
Off Heap Memory (MB)   : 3.21
Data Center: datacenter1
Rack   : rack1
Exceptions : 0
Key Cache  : entries 76, size 6.92 KiB, capacity 100 MiB, 3995 
hits, 4079 requests, 0.979 recent hit rate, 14400 save period in seconds
Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
requests, NaN recent hit rate, 0 save period in seconds
Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache: entries 4, size 256 KiB, capacity 480 MiB, 549 misses, 
4633 requests, 0.882 recent hit rate, NaN microseconds miss latency
Percent Repaired   : 100.0%
Token  : (invoke with -T/--tokens to see all 256 tokens)


From: "ZAIDI, ASAD A" 
Date: Wednesday, June 14, 2017 at 6:23 PM
To: "user@cassandra.apache.org" 
Subject: Apache Cassandra - Memory usage on server

Hi folks,

I’m using apache Cassandra 2.2.

Instance is configured with max_heap_size set at 16G, memtable_allocation_type 
is  offheap_objects – total available memory is 62G on the server.
There is nothing but Cassandra is running on my Linux server.

My Cassandra instance is consuming all available memory on my machine!


$ free -h
 total   used   free sharedbuffers cached
Mem:   62G62G   776M   184K35M42G
-/+ buffers/cache:19G43G
Swap: 4.0G 0B   4.0G

Can you guys  please suggest how I can limit memory usage of my Cassandra 
instance and why C* may be using all available memory?
I’ll much appreciate your suggestions.

Thanks/Asad



Re: Question: Large partition warning

2017-06-14 Thread Fay Hou [Data Pipeline & Real-time Analytics] ­
 you really should keep partition size under 100M

On Wed, Jun 14, 2017 at 7:28 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> Thank you Kurt - that makes sense.
> Will certainly reduce it to 1024.
>
> Greatly appreciate your quick reply.
>
> Thanks,
>
> Jayesh
>
>
>
> From: kurt greaves
> Sent: Wednesday, June 14, 5:53 PM
> Subject: Re: Question: Large partition warning
> To: Fay Hou [Data Pipeline & Real-time Analytics] ­
> Cc: Thakrar, Jayesh, User
>
>
> Looks like you've hit a bug (not the first time I've seen this in relation
> to C* configs). compaction_large_partition_warning_threshold_mb resolves
> to an int, and in the codebase is represented in bytes. 4096 * 1024 * 1024
> and you've got some serious overflow. Granted, you should have this warning
> set considerably lower than 4096MB anyway. Try set it to 1024MB. Looks like
> this affects multiple versions so I'll create a JIRA/patch soon.
>
>
> Relevant code:
>
>  public static int getCompactionLargePartitionWarningThreshold() { return
> conf.compaction_large_partition_warning_threshold_mb * 1024 * 1024; }
>
>
>
>


Re: Bottleneck for small inserts?

2017-06-14 Thread Eric Pederson
Using cassandra-stress with the out of the box schema I am seeing around
140k rows/second throughput using 1 client on each of 3 client machines.
On the servers:

   - CPU utilization: 43% usr/20% sys, 55%/28%, 70%/10% (the last number is
   the older box)
   - Inbound network traffic: 174 Mbps, 190 Mbps, 178 Mbps
   - Disk writes/sec: ~10k each server
   - Disk utilization is in the low single digits but spikes up to 50%
   - Disk queue size is in the low single digits but spikes up into the mid
   hundreds.  I even saw in the thousands.   I had not noticed this before.

The disk stats come from iostat -xz 1.   Given the low reported utilization
%s I would not expect to see any disk queue buildup, even low single digits.

Going to 2 cassandra-stress clients per machine the throughput dropped to
133k rows/sec.

   - CPU utilization: 13% usr/5% sys, 15%/25%, 40%/22% on the older box
   - Inbound network RX: 100Mbps, 125Mbps, 120Mbps
   - Disk utilization is a little lower, but with the same spiky behavior

Going to 3 cassandra-stress clients per machine the throughput dropped to
110k rows/sec

   - CPU utilization: 15% usr/20% sys,  15%/20%, 40%/20% on the older box
   - Inbound network RX dropped to 130 Mbps
   - Disk utilization stayed roughly the same

I noticed that with the standard cassandra-stress schema GC is not an
issue.   But with my application-specific schema there is a lot of GC on
the slower box.  Also with the application-specific schema I can't seem to
get past 36k rows/sec.   The application schema has 64 columns (mostly
ints) and the key is (date,sequence#).   The standard stress schema has a
lot fewer columns and no clustering column.

Thanks,



-- Eric

On Wed, Jun 14, 2017 at 1:47 AM, Eric Pederson  wrote:

> Shoot - I didn't see that one.  I subscribe to the digest but was focusing
> on the direct replies and accidentally missed Patrick and Jeff Jirsa's
> messages.  Sorry about that...
>
> I've been using a combination of cassandra-stress, cqlsh COPY FROM and a
> custom C++ application for my ingestion testing.   My default setting for
> my custom client application is 96 threads, and then by default I run one
> client application process on each of 3 machines.  I tried
> doubling/quadrupling the number of client threads (and doubling/tripling
> the number of client processes but keeping the threads per process the
> same) but didn't see any change.   If I recall correctly I started getting
> timeouts after I went much beyond concurrent_writes which is 384 (for a 48
> CPU box) - meaning at 500 threads per client machine I started seeing
> timeouts.I'll try again to be sure.
>
> For the purposes of this conversation I will try to always use
> cassandra-stress to keep the number of unknowns limited.  I'll will run
> more cassandra-stress clients tomorrow in line with Patrick's 3-5 per
> server recommendation.
>
> Thanks!
>
>
> -- Eric
>
> On Wed, Jun 14, 2017 at 12:40 AM, Jonathan Haddad 
> wrote:
>
>> Did you try adding more client stress nodes as Patrick recommended?
>>
>> On Tue, Jun 13, 2017 at 9:31 PM Eric Pederson  wrote:
>>
>>> Scratch that theory - the flamegraphs show that GC is only 3-4% of two
>>> newer machine's overall processing, compared to 18% on the slow machine.
>>>
>>> I took that machine out of the cluster completely and recreated the
>>> keyspaces.  The ingest tests now run slightly faster (!).   I would have
>>> expected a linear slowdown since the load is fairly balanced across
>>> partitions.  GC appears to be the bottleneck in the 3-server
>>> configuration.  But still in the two-server configuration the
>>> CPU/disk/network is still not being fully utilized (the closest is CPU at
>>> ~45% on one ingest test).  nodetool tpstats shows only blips of
>>> queueing.
>>>
>>>
>>>
>>>
>>> -- Eric
>>>
>>> On Mon, Jun 12, 2017 at 9:50 PM, Eric Pederson 
>>> wrote:
>>>
 Hi all - I wanted to follow up on this.  I'm happy with the throughput
 we're getting but I'm still curious about the bottleneck.

 The big thing that sticks out is one of the nodes is logging frequent
 GCInspector messages: 350-500ms every 3-6 seconds.  All three nodes in
 the cluster have identical Cassandra configuration, but the node that is
 logging frequent GCs is an older machine with slower CPU and SSD.  This
 node logs frequent GCInspectors both under load and when compacting
 but otherwise unloaded.

 My theory is that the other two nodes have similar GC frequency
 (because they are seeing the same basic load), but because they are faster
 machines, they don't spend as much time per GC and don't cross the
 GCInspector threshold.  Does that sound plausible?   nodetool tpstats
 doesn't show any queueing in the system.

 Here's flamegraphs from the system when running a cqlsh COPY FROM:

- 

Re: Question: Large partition warning

2017-06-14 Thread Thakrar, Jayesh
Thank you Kurt - that makes sense.
Will certainly reduce it to 1024.

Greatly appreciate your quick reply.

Thanks,

Jayesh



From: kurt greaves
Sent: Wednesday, June 14, 5:53 PM
Subject: Re: Question: Large partition warning
To: Fay Hou [Data Pipeline & Real-time Analytics] ­
Cc: Thakrar, Jayesh, User


Looks like you've hit a bug (not the first time I've seen this in relation to 
C* configs). compaction_large_partition_warning_threshold_mb resolves to an 
int, and in the codebase is represented in bytes. 4096 * 1024 * 1024 and you've 
got some serious overflow. Granted, you should have this warning set 
considerably lower than 4096MB anyway. Try set it to 1024MB. Looks like this 
affects multiple versions so I'll create a JIRA/patch soon.


Relevant code:

 public static int getCompactionLargePartitionWarningThreshold() { return 
conf.compaction_large_partition_warning_threshold_mb * 1024 * 1024; }





RE: Question: Large partition warning

2017-06-14 Thread ZAIDI, ASAD A
Check partition size of your table’s partitions (  nodetool tablehistograms 
tsg_ae logs_by_user).
You may need to reduce partition size of your table using guidelines given in [ 
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/planning/planPlanningPartitionSize.html]
 & 
[https://stackoverflow.com/questions/20512710/cassandra-has-a-limit-of-2-billion-cells-per-partition-but-whats-a-partition]




From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: Wednesday, June 14, 2017 3:14 PM
To: User 
Subject: Question: Large partition warning

We are on Cassandra 2.2.5 and I am constantly seeing warning messages about 
large partitions in system.log even though our setting for partition warning 
threshold is set to 4096 (MB).

WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,189 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:114303419784957147 (17187 bytes)
WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,190 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:820590487870502244 (2613 bytes)
WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,191 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:421586605444858333 (54923 bytes)

Here's what we have in cassandra.yaml.

# Log a warning when compacting partitions larger than this value
compaction_large_partition_warning_threshold_mb: 4096

So wondering if such a warning is normal or is there something wrong in our 
configuration or is it a bug?

I have looked at 
https://issues.apache.org/jira/browse/CASSANDRA-9643.

Any pointers will be greatly appreciated.

Thanks,
Jayesh



Reaper v0.6.1 released

2017-06-14 Thread Jonathan Haddad
Hey folks!

I'm proud to announce the 0.6.1 release of the Reaper project, the open
source repair management tool for Apache Cassandra.

This release improves the Cassandra backend significantly, making it a
first class citizen for storing repair schedules and managing repair
progress.  It's no longer necessary to manage a PostgreSQL DB in addition
to your Cassandra DB.

We've been very active since we forked the original Spotify repo.  Since
this time we've added:

* A native Cassandra backend
* Support for versions > 2.0
* Merged in the WebUI, maintained by Stefan Podkowinski (
https://github.com/spodkowinski/cassandra-reaper-ui)
* Support for incremental repair (probably best to avoid till Cassandra
4.0, see CASSANDRA-9143)

We're excited to continue making improvements past the original intent of
the project.  With the lack of Cassandra 3.0 support in OpsCenter, there's
a gap that needs to be filled for tools that help with managing a cluster.
Alex Dejanovski showed me a prototype he recently put together for a really
nice view into cluster health.  We're also looking to add in support common
cluster operations like snapshots, upgradesstables, cleanup, and setting
options at runtime.

Grab it here: https://github.com/thelastpickle/cassandra-reaper

Feedback / bug reports / ideas are very much appreciated.

We have a dedicated, low traffic ML here:
https://groups.google.com/forum/#!forum/tlp-apache-cassandra-reaper-users

Jon Haddad
Principal Consultant, The Last Pickle
http://thelastpickle.com/


Apache Cassandra - Memory usage on server

2017-06-14 Thread ZAIDI, ASAD A
Hi folks,

I’m using apache Cassandra 2.2.

Instance is configured with max_heap_size set at 16G, memtable_allocation_type 
is  offheap_objects – total available memory is 62G on the server.
There is nothing but Cassandra is running on my Linux server.

My Cassandra instance is consuming all available memory on my machine!

$ free -h
 total   used   free sharedbuffers cached
Mem:   62G62G   776M   184K35M42G
-/+ buffers/cache:19G43G
Swap: 4.0G 0B   4.0G

Can you guys  please suggest how I can limit memory usage of my Cassandra 
instance and why C* may be using all available memory?
I’ll much appreciate your suggestions.

Thanks/Asad



Re: Question: Large partition warning

2017-06-14 Thread kurt greaves
Looks like you've hit a bug (not the first time I've seen this in relation
to C* configs). compaction_large_partition_warning_threshold_mb resolves to
an int, and in the codebase is represented in bytes. 4096 * 1024 * 1024 and
you've got some serious overflow. Granted, you should have this warning set
considerably lower than 4096MB anyway. Try set it to 1024MB. Looks like
this affects multiple versions so I'll create a JIRA/patch soon.

Relevant code:
 public static int getCompactionLargePartitionWarningThreshold() { return
conf.compaction_large_partition_warning_threshold_mb * 1024 * 1024; }


Correct ways to use Nodetool JMX Classes in Seperate Process

2017-06-14 Thread Nathan Jackels
Hi all,

A project I'm working on right now requires that a daemon/service running
on the same host as Cassandra be able to connect via JMX for many of the
same functions as nodetool and sstablemetadata.
The classpath that nodetool uses includes all the jars in cassandra/lib, so
we are using the same list.

The issue that we're running into is that eventually some of the classes in
org.apache.cassandra.db.commitlog are loaded, threads are created and
started in the static blocks and the process opens a handle/fd on the
commitlog files.
It will also creates a new log file if the commitlog_directory is empty.

Right now we're getting around this by putting a jar with no-op
implementations of nearly everything in the db.commitlog package on the
classpath before the cassandra/lib jars.
I've created CASSANDRA-13605 because it seems possible that this could be
observed by a long-running nodetool repair or compaction command, but in
the meantime are there any more common practices than this kludge we've
implemented?

We want to call public methods from the Cassandra internals instead of
creating nodetool processes and parsing stdout.
Adding a jar to the Cassandra classpath as a proxy is also not preferred
because of the way our product dependencies are handled internally at the
company.

Thanks,
Nathan


Re: Question: Large partition warning

2017-06-14 Thread Fay Hou [Data Pipeline & Real-time Analytics] ­
 nodetool tablehistograms  tsg_ae/logs_by_user will give you an idea about
the estimate of the partition size. It is recommended that the partition
size is not over 100MB

Large partitions are also creating heap pressure during compactions, which
will issue warnings in the logs (look for "large partition").

Thanks,

Fay

On Wed, Jun 14, 2017 at 1:13 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> We are on Cassandra 2.2.5 and I am constantly seeing warning messages
> about large partitions in system.log even though our setting for partition
> warning threshold is set to 4096 (MB).
>
>
>
> WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,189
> BigTableWriter.java:184 - Writing large partition 
> tsg_ae/logs_by_user:114303419784957147
> (17187 bytes)
>
> WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,190
> BigTableWriter.java:184 - Writing large partition 
> tsg_ae/logs_by_user:820590487870502244
> (2613 bytes)
>
> WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,191
> BigTableWriter.java:184 - Writing large partition 
> tsg_ae/logs_by_user:421586605444858333
> (54923 bytes)
>
>
>
> Here's what we have in cassandra.yaml.
>
>
>
> # Log a warning when compacting partitions larger than this value
>
> compaction_large_partition_warning_threshold_mb: 4096
>
>
>
> So wondering if such a warning is normal or is there something wrong in
> our configuration or is it a bug?
>
>
>
> I have looked at https://issues.apache.org/jira/browse/CASSANDRA-9643.
>
>
>
> Any pointers will be greatly appreciated.
>
>
>
> Thanks,
>
> Jayesh
>
>
>


Question: Large partition warning

2017-06-14 Thread Thakrar, Jayesh
We are on Cassandra 2.2.5 and I am constantly seeing warning messages about 
large partitions in system.log even though our setting for partition warning 
threshold is set to 4096 (MB).

WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,189 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:114303419784957147 (17187 bytes)
WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,190 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:820590487870502244 (2613 bytes)
WARN  [CompactionExecutor:43180] 2017-06-14 20:02:13,191 
BigTableWriter.java:184 - Writing large partition 
tsg_ae/logs_by_user:421586605444858333 (54923 bytes)

Here's what we have in cassandra.yaml.

# Log a warning when compacting partitions larger than this value
compaction_large_partition_warning_threshold_mb: 4096

So wondering if such a warning is normal or is there something wrong in our 
configuration or is it a bug?

I have looked at https://issues.apache.org/jira/browse/CASSANDRA-9643.

Any pointers will be greatly appreciated.

Thanks,
Jayesh



Re: Node replacement strategy with AWS EBS

2017-06-14 Thread Hannu Kröger
Hi,

So, if it works, great.

auto_bootstrap false is not needed when you have system keyspace as also
mentioned in the article. Now you are likely to have different tokens then
the previous node (unless those were manually configured to match the old
node) and repair and cleanup are needed to get that node to “right” state.
But if the tokens were configured ok then repair & cleanup are not needed.

Cheers,
Hannu

On 14 June 2017 at 13:37:29, Rutvij Bhatt (rut...@sense.com) wrote:

Thanks again for your help! To summarize for anyone who stumbles onto this
in the future, this article covers the procedure well:
https://www.eventbrite.com/engineering/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

It is more or less what Hannu suggested.

I carried out the following steps:
1. safely stop the cassandra instance (nodetool drain + service cassandra
stop)
2. Shut down the ec2 instance.
3. detach the storage volume from old instance.
4. attach to new instance.
5. point cassandra configuration on new instance to this drive and set
auto_bootstrap: false
6. start cassandra on new instance. Once it has established connection with
peers, you will notice that it takes over the token ranges on its own.
Doing a select on the system.peers table will show that the old node is
gone.
7. Run nodetool repair if need be.

On Tue, Jun 13, 2017 at 1:01 PM Rutvij Bhatt  wrote:

> Nevermind, I misunderstood the first link. In this case, the replacement
> would just be leaving the listen_address as is (to
> InetAddress.getLocalHost()) and just start the new instance up as you
> pointed out in your original answer Hannu.
>
> Thanks.
>
> On Tue, Jun 13, 2017 at 12:35 PM Rutvij Bhatt  wrote:
>
>> Hannu/Nitan,
>>
>> Thanks for your help so far! From what you said in your first response, I
>> can get away with just attaching the EBS volume to Cassandra and starting
>> it with the old node's private IP as my listen_address because it will take
>> over the token assignment from the old node using the data files? With
>> regards to "Cassandra automatically realizes that have just effectively
>> changed IP address.", it says in the first link to change this manually to
>> the desired address - does this not apply in my case if I'm replacing the
>> old node?
>>
>> As for the plan I outlined earlier, is this more for DR scenarios where I
>> have lost a node due to hardware failure and I need to recover the data in
>> a safe manner by requesting a stream from the other replicas?  Am I
>> understanding this right?
>>
>>
>> On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger  wrote:
>>
>>> Hello,
>>>
>>> So the local information about tokens is stored in the system keyspace.
>>> Also the host id and all that.
>>>
>>> Also documented here:
>>>
>>> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>>>
>>> If for any reason that causes issues, you can also check this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-8382
>>>
>>> If you copy all cassandra data, you are on the safe side. Good point in
>>> the links is that if you have IP addresses in topolgy or other files, then
>>> update those as well.
>>>
>>> Hannu
>>>
>>> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>>>
>>> Hannu,
>>>
>>> "Cassandra automatically realizes that have just effectively changed IP
>>> address” —> are you sure C* will take care of IP change as is? How will it
>>> know which token range to be assigned to this new IP address?
>>>
>>> On Jun 13, 2017, at 10:51 AM, Hannu Kröger  wrote:
>>>
>>> Cassandra automatically realizes that have just effectively changed IP
>>> address
>>>
>>>
>>>


Re: Node replacement strategy with AWS EBS

2017-06-14 Thread Rutvij Bhatt
Thanks again for your help! To summarize for anyone who stumbles onto this
in the future, this article covers the procedure well:
https://www.eventbrite.com/engineering/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

It is more or less what Hannu suggested.

I carried out the following steps:
1. safely stop the cassandra instance (nodetool drain + service cassandra
stop)
2. Shut down the ec2 instance.
3. detach the storage volume from old instance.
4. attach to new instance.
5. point cassandra configuration on new instance to this drive and set
auto_bootstrap: false
6. start cassandra on new instance. Once it has established connection with
peers, you will notice that it takes over the token ranges on its own.
Doing a select on the system.peers table will show that the old node is
gone.
7. Run nodetool repair if need be.

On Tue, Jun 13, 2017 at 1:01 PM Rutvij Bhatt  wrote:

> Nevermind, I misunderstood the first link. In this case, the replacement
> would just be leaving the listen_address as is (to
> InetAddress.getLocalHost()) and just start the new instance up as you
> pointed out in your original answer Hannu.
>
> Thanks.
>
> On Tue, Jun 13, 2017 at 12:35 PM Rutvij Bhatt  wrote:
>
>> Hannu/Nitan,
>>
>> Thanks for your help so far! From what you said in your first response, I
>> can get away with just attaching the EBS volume to Cassandra and starting
>> it with the old node's private IP as my listen_address because it will take
>> over the token assignment from the old node using the data files? With
>> regards to "Cassandra automatically realizes that have just effectively
>> changed IP address.", it says in the first link to change this manually to
>> the desired address - does this not apply in my case if I'm replacing the
>> old node?
>>
>> As for the plan I outlined earlier, is this more for DR scenarios where I
>> have lost a node due to hardware failure and I need to recover the data in
>> a safe manner by requesting a stream from the other replicas?  Am I
>> understanding this right?
>>
>>
>> On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger  wrote:
>>
>>> Hello,
>>>
>>> So the local information about tokens is stored in the system keyspace.
>>> Also the host id and all that.
>>>
>>> Also documented here:
>>>
>>> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>>>
>>> If for any reason that causes issues, you can also check this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-8382
>>>
>>> If you copy all cassandra data, you are on the safe side. Good point in
>>> the links is that if you have IP addresses in topolgy or other files, then
>>> update those as well.
>>>
>>> Hannu
>>>
>>> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>>>
>>> Hannu,
>>>
>>> "Cassandra automatically realizes that have just effectively changed IP
>>> address” —> are you sure C* will take care of IP change as is? How will it
>>> know which token range to be assigned to this new IP address?
>>>
>>> On Jun 13, 2017, at 10:51 AM, Hannu Kröger  wrote:
>>>
>>> Cassandra automatically realizes that have just effectively changed IP
>>> address
>>>
>>>
>>>


Upgrade from 3.0.6, where's the documentation?

2017-06-14 Thread Riccardo Ferrari
Hi list,

It's been a while since I upgraded my C* to 3.0.6, nevertheless I would
like to give TWCS a try (avaialble since 3.0.7).

What happened to the upgrade documentation ? I was used to read some
step-by-step procedure from datastax but looks like they are not supporting
it anymore, on the flip side  I can't find anything meaningful on
cassandra.apache.org website. What am I missing?

Are there any better resources other than:
https://github.com/apache/cassandra/blob/cassandra-3.0.13/NEWS.txt
https://github.com/apache/cassandra/blob/cassandra-3.0.13/CHANGES.txt

What is the most stabe 3.0.X version, is it the latest?

Thanks,


Re: Cannot achieve consistency level LOCAL_ONE

2017-06-14 Thread wxn...@zjqunshuo.com
Thanks for the detail explanation. You did solve my problem.

Cheers,
-Simon
 
From: Oleksandr Shulgin
Date: 2017-06-14 17:09
To: wxn...@zjqunshuo.com
CC: user
Subject: Re: Cannot achieve consistency level LOCAL_ONE
On Wed, Jun 14, 2017 at 10:46 AM, wxn...@zjqunshuo.com  
wrote:
Thanks for the reply.
My system_auth settings is as below and what should I do with it? And I'm 
interested why the newly added node is responsible for the user authentication?

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

You should change the replication options to use NetworkTopologyStrategy and a 
replication factor greater than 1 in each DC.  It is not uncommon to set it to 
the number of nodes in the DC, and is actually recommended by the following 
piece of documentation: 

http://docs.datastax.com/en/cassandra/2.1/cassandra/security/security_config_native_authenticate_t.html

For version 3, the official doc says to set it to 3-5 nodes per DC: 
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureConfigNativeAuth.html

In general there is no drawback when setting the RF for system_auth to the 
number of nodes in DC, unless you're relying on the default superuser named 
"cassandra" being able to login at all times.  This user is special and it 
requires LOCAL_QUORUM in order to log in, while any other user (including 
non-default superusers) require only LOCAL_ONE.

As to the reason why the new node is responsible for authenticating your 
application user.  There is no particular reason for that.  The new node is 
assigned a random set of tokens and it happened to be responsible for that 
user, while some of the old nodes is no longer responsible (remember, you have 
RF=1).

Hope this helps,
--
Alex



Re: Cannot achieve consistency level LOCAL_ONE

2017-06-14 Thread Oleksandr Shulgin
On Wed, Jun 14, 2017 at 10:46 AM, wxn...@zjqunshuo.com  wrote:

> Thanks for the reply.
> My system_auth settings is as below and what should I do with it? And I'm
> interested why the newly added node is responsible for the user
> authentication?
>
> CREATE KEYSPACE system_auth WITH replication = {'class': '
> SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
>

You should change the replication options to use NetworkTopologyStrategy
and a replication factor greater than 1 in each DC.  It is not uncommon to
set it to the number of nodes in the DC, and is actually recommended by the
following piece of documentation:

http://docs.datastax.com/en/cassandra/2.1/cassandra/security/security_config_native_authenticate_t.html

For version 3, the official doc says to set it to 3-5 nodes per DC:
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureConfigNativeAuth.html

In general there is no drawback when setting the RF for system_auth to the
number of nodes in DC, unless you're relying on the default superuser named
"cassandra" being able to login at all times.  This user is special and it
requires LOCAL_QUORUM in order to log in, while any other user (including
non-default superusers) require only LOCAL_ONE.

As to the reason why the new node is responsible for authenticating your
application user.  There is no particular reason for that.  The new node is
assigned a random set of tokens and it happened to be responsible for that
user, while some of the old nodes is no longer responsible (remember, you
have RF=1).

Hope this helps,
--
Alex


Re: Cannot achieve consistency level LOCAL_ONE

2017-06-14 Thread wxn...@zjqunshuo.com
Thanks for the reply.
My system_auth settings is as below and what should I do with it? And I'm 
interested why the newly added node is responsible for the user authentication?

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

-Simon

 
From: Oleksandr Shulgin
Date: 2017-06-14 16:36
To: wxn...@zjqunshuo.com
CC: user
Subject: Re: Cannot achieve consistency level LOCAL_ONE
On Wed, Jun 14, 2017 at 9:11 AM, wxn...@zjqunshuo.com  
wrote:
Hi,
Cluster set up:
1 DC with 5 nodes (each node having 700GB data)
1 kespace with RF of 2
write CL is LOCAL_ONE
read CL is LOCAL_QUORUM

One node was down for about 1 hour because of OOM issue. During the down 
period, all 4 other nodes report "Cannot achieve consistency level LOCAL_ONE" 
constantly until I brought up the dead node. My data seems lost during that 
down time. To me this could not happen because the write CL is LOCAL_ONE and 
only one node was dead. I encountered node down before because of OOM issue and 
I believe I didn't lose data because of the hinted handoff feature.  

Hi,

The problem here is at a different level: not a single replica of the data 
could be written because no coordinator was available to serve the 
(authentication, see below) request.

One more thing, the dead node was added recently and the only difference is the 
other 4 nodes are behind an internal SLB(Service Load Balance) with VIP, and 
the new one not.
Our application access Casssandra cluster by the SLB VIP.

Any thoughts are appreciated.

Best regards,
-Simon
  
System log:
57659 Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: org.apache.cassandra.exceptions.Unavai
lableException: Cannot achieve consistency level LOCAL_ONE
  57660 at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) 
~[guava-16.0.jar:na]
  57661 at com.google.common.cache.LocalCache.get(LocalCache.java:3934) 
~[guava-16.0.jar:na]
  57662 at 
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) 
~[guava-16.0.jar:na]
  57663 at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) 
~[guava-16.0.jar:na]
  57664 at 
org.apache.cassandra.auth.RolesCache.getRoles(RolesCache.java:70) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57665 at 
org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:51) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57666 at 
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57667 at 
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
 ~[apache-cassandra-2.2.8.jar:2.2.8]

What are the replication settings of your system_auth keyspace?  It looks like 
the node being down was responsible for the only replica of the user info 
needed to check its credentials/permissions.

Cheers,
--
Alex



Re: Cannot achieve consistency level LOCAL_ONE

2017-06-14 Thread Oleksandr Shulgin
On Wed, Jun 14, 2017 at 9:11 AM, wxn...@zjqunshuo.com 
wrote:

> Hi,
> Cluster set up:
> 1 DC with 5 nodes (each node having 700GB data)
> 1 kespace with RF of 2
> write CL is LOCAL_ONE
> read CL is LOCAL_QUORUM
>
> One node was down for about 1 hour because of OOM issue. During the down
> period, all 4 other nodes report "Cannot achieve consistency
> level LOCAL_ONE" constantly until I brought up the dead node. My data
> seems lost during that down time. To me this could not happen because the
> write CL is LOCAL_ONE and only one node was dead. I encountered node down
> before because of OOM issue and I believe I didn't lose data because of the
> hinted handoff feature.
>

Hi,

The problem here is at a different level: not a single replica of the data
could be written because no coordinator was available to serve the
(authentication, see below) request.

One more thing, the dead node was added recently and the only difference is
> the other 4 nodes are behind an internal SLB(Service Load Balance) with
> VIP, and the new one not.
> Our application access Casssandra cluster by the SLB VIP.
>
> Any thoughts are appreciated.
>
> Best regards,
> -Simon
>
> System log:
> 57659 Caused by: com.google.common.util.concurrent.
> UncheckedExecutionException: java.lang.RuntimeException:
> org.apache.cassandra.exceptions.UnavailableException: Cannot
> achieve consistency level LOCAL_ONE
>   57660 at com.google.common.cache.LocalCache$
> Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na]
>   57661 at com.google.common.cache.LocalCache.get(
> LocalCache.java:3934) ~[guava-16.0.jar:na]
>   57662 at com.google.common.cache.LocalCache.
> getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na]
>   57663 at com.google.common.cache.LocalCache$
> LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na]
>   57664 at org.apache.cassandra.auth.RolesCache.
> getRoles(RolesCache.java:70) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57665 at org.apache.cassandra.auth.Roles.
> hasSuperuserStatus(Roles.java:51) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57666 at org.apache.cassandra.auth.AuthenticatedUser.isSuper(
> AuthenticatedUser.java:71) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57667 at org.apache.cassandra.auth.
> CassandraAuthorizer.authorize(CassandraAuthorizer.java:76) ~
> [apache-cassandra-2.2.8.jar:2.2.8]
>

What are the replication settings of your system_auth keyspace?  It looks
like the node being down was responsible for the only replica of the user
info needed to check its credentials/permissions.

Cheers,
--
Alex


Cannot achieve consistency level LOCAL_ONE

2017-06-14 Thread wxn...@zjqunshuo.com
Hi,
Cluster set up:
1 DC with 5 nodes (each node having 700GB data)
1 kespace with RF of 2
write CL is LOCAL_ONE
read CL is LOCAL_QUORUM

One node was down for about 1 hour because of OOM issue. During the down 
period, all 4 other nodes report "Cannot achieve consistency level LOCAL_ONE" 
constantly until I brought up the dead node. My data seems lost during that 
down time. To me this could not happen because the write CL is LOCAL_ONE and 
only one node was dead. I encountered node down before because of OOM issue and 
I believe I didn't lose data because of the hinted handoff feature.  

One more thing, the dead node was added recently and the only difference is the 
other 4 nodes are behind an internal SLB(Service Load Balance) with VIP, and 
the new one not.
Our application access Casssandra cluster by the SLB VIP.

Any thoughts are appreciated.

Best regards,
-Simon
  
System log:
57659 Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: org.apache.cassandra.exceptions.Unavai
lableException: Cannot achieve consistency level LOCAL_ONE
  57660 at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) 
~[guava-16.0.jar:na]
  57661 at com.google.common.cache.LocalCache.get(LocalCache.java:3934) 
~[guava-16.0.jar:na]
  57662 at 
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) 
~[guava-16.0.jar:na]
  57663 at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) 
~[guava-16.0.jar:na]
  57664 at 
org.apache.cassandra.auth.RolesCache.getRoles(RolesCache.java:70) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57665 at 
org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:51) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57666 at 
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57667 at 
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
 ~[apache-cassandra-2.2.8.jar:2.2.8]
  57668 at 
org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:124) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57669 at 
org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:121) 
~[apache-cassandra-2.2.8.jar:2.2.8]
  57670 at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
 ~[guava-16.0.jar:na]
  57671 at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317) 
~[guava-16.0.jar:na]
  57672 at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2280)
 ~[guava-16.0.jar:na]
  57673 at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195) 
~[guava-16.0.jar:na]
  57674 ... 25 common frames omitted