Re: DC aware failover

2017-11-15 Thread Alexander Dejanovski
Hi,

The policy is used in production at least in my former company.

I can help if you have issues using it.

Cheers,

Le jeu. 16 nov. 2017 à 08:32, CPC  a écrit :

> Hi,
>
> We want to implement DC aware failover policy. For example if application
> could not reach some part of the ring or if we loose 50% of local DC then
> we want our application automatically to switch other DC. We found this
> project on GitHub
> https://github.com/adejanovski/cassandra-dcaware-failover but we don't
> know whether it is stable and used in production. Do you know about this
> project or do you know other projects that provide same kind of
> functionality.
>
> Thanks...
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


DC aware failover

2017-11-15 Thread CPC
Hi,

We want to implement DC aware failover policy. For example if application
could not reach some part of the ring or if we loose 50% of local DC then
we want our application automatically to switch other DC. We found this
project on GitHub  https://github.com/adejanovski/cassandra-dcaware-failover
but we don't know whether it is stable and used in production. Do you know
about this project or do you know other projects that provide same kind of
functionality.

Thanks...


Re: Repair failing after it was interrupted once

2017-11-15 Thread Erick Ramirez
Check that there are no running repair threads on the nodes with nodetool
netstats.

For those that do have running repairs, restart C* on them to kill the
repair threads and you should be able to repair the nodes again. Cheers!

On Wed, Nov 15, 2017 at 8:08 PM, Dipan Shah  wrote:

> Hello,
>
>
> I was running a "nodetool repair -pr" command on one node and due to some
> network issues I lost connection to the server.
>
>
> Now when I am running the same command on that and other servers too, the
> repair job if failing with the following log:
>
>
> [2017-11-15 03:55:19,965] Some repair failed
> [2017-11-15 03:55:19,965] Repair command #1 finished in 0 seconds
> error: Repair job has failed with the error message: [2017-11-15
> 03:55:19,965] Some repair failed
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2017-11-15 03:55:19,965] Some repair failed
> at org.apache.cassandra.tools.RepairRunner.progress(
> RepairRunner.java:116)
> at org.apache.cassandra.utils.progress.jmx.
> JMXNotificationProgressListener.handleNotification(
> JMXNotificationProgressListener.java:77)
> at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.
> dispatchNotification(ClientNotifForwarder.java:583)
> at com.sun.jmx.remote.internal.ClientNotifForwarder$
> NotifFetcher.doRun(ClientNotifForwarder.java:533)
> at com.sun.jmx.remote.internal.ClientNotifForwarder$
> NotifFetcher.run(ClientNotifForwarder.java:452)
> at com.sun.jmx.remote.internal.ClientNotifForwarder$
> LinearExecutor$1.run(ClientNotifForwarder.java:108)
>
> I found a few JIRA issues related to this but they were marked as fixed so
> I am not really sure if this is a bug. I am running Cassandra V 3.11.0.
>
>
> One stackoverflow post suggested that I should restart all nodes and that
> seems to be overkill.
>
>
> Can someone please guide me through this?
>
>
> Thanks,
>
> Dipan Shah
>


Re: Executing a check before replication / manual replication

2017-11-15 Thread Subroto Barua
 turn on audit on tables in question, scan the audit logs (using tools like 
Splunk) and send alerts based on the activity...
On Wednesday, November 15, 2017, 12:33:30 PM PST, Abdelkrim Fitouri 
 wrote:  
 
 Hi,

I know that cassandra handel properly data replication between cluster nodes, 
but for some security reasons I am wonderning how to avoid data replication 
after a server node have been compromised and someone is executing modification 
via cqlsh ?
is there a posibility on Cassandra to execute a custom check / Hook  before 
replication ?
is there a posibilty to execute a manual replication between node ?




-- 

Best Regards.

Abdelkarim FITOURI

SystemAnd Security Engineer

  

Executing a check before replication / manual replication

2017-11-15 Thread Abdelkrim Fitouri
Hi,

I know that cassandra handel properly data replication between cluster
nodes, but for some security reasons I am wonderning how to avoid data
replication after a server node have been compromised and someone is
executing modification via cqlsh ?

is there a posibility on Cassandra to execute a custom check / Hook  before
replication ?

is there a posibilty to execute a manual replication between node ?

>


-- 

Best Regards.

*Abdelkarim FITOURI*

System And Security Engineer


Re: CQL Map vs clustering keys

2017-11-15 Thread Jon Haddad
In 3.0, clustering columns are not actually part of the column name anymore.  
Yay.  Aaron Morton wrote a detailed analysis of the 3.x storage engine here: 
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
 


The advantage of maps is a single table that can contain a very flexible data 
model, of maps and sets all in the same table.  Fun times.

The advantage of using clustering keys is performance and you can use WAY more 
K/V pairs.  

Jon

> On Nov 15, 2017, at 8:12 AM, eugene miretsky  
> wrote:
> 
> Hi, 
> 
> What would be the tradeoffs between using
> 
> 1) Map
> 
> (
> id UUID PRIMARY KEY,  
> myMap map
> );
> 
> 2) Clustering key
> 
> (
>  id UUID PRIMARY KEY,
> key int,
> val text,
> PRIMARY KEY (id, key))
> );
> 
> My understanding is that maps are stored very similarly to clustering 
> columns, where the map key is part of the SSTable's column name. The main 
> difference seems to be that with maps all the key/value pairs get retrieved 
> together, while with clustering keys we can retrieve individual rows, or a 
> range of keys. 
> 
> Cheers,
> Eugene 



Re: CQL Map vs clustering keys

2017-11-15 Thread DuyHai Doan
Yes, your remark is correct.

However, once CASSANDRA-7396 (right now in 4.0 trunk) get released, you
will be able to get a slice of map values using their (sorted) keys

SELECT map[fromKey ... toKey] FROM TABLE ...

Needless to say, it will be also possible to get a single element from the
map by its key with SELECT map[key] syntax

It will work exactly like clustering columns storage engine-wise.



On Wed, Nov 15, 2017 at 5:12 PM, eugene miretsky 
wrote:

> Hi,
>
> What would be the tradeoffs between using
>
> 1) Map
>
> (
>
> id UUID PRIMARY KEY,
>
> myMap map
>
> );
>
> 2) Clustering key
>
> (
>
>  id UUID PRIMARY KEY,
>
> key int,
>
> val text,
>
> PRIMARY KEY (id, key))
>
> );
>
> My understanding is that maps are stored very similarly to clustering
> columns, where the map key is part of the SSTable's column name. The main
> difference seems to be that with maps all the key/value pairs get retrieved
> together, while with clustering keys we can retrieve individual rows, or a
> range of keys.
>
> Cheers,
> Eugene
>


Re: Reaper 1.0

2017-11-15 Thread Jon Haddad
Apache 2 Licensed, just like Cassandra.  
https://github.com/thelastpickle/cassandra-reaper/blob/master/LICENSE.txt 


Feel free to modify, put in prod, fork or improve. 

Unfortunately I had to re-upload the Getting Started video, we had accidentally 
uploaded a first cut.  Correctly link is here: 
https://www.youtube.com/watch?v=0dub29BgwPI 


Jon

> On Nov 15, 2017, at 9:14 AM, Harika Vangapelli -T (hvangape - AKRAYA INC at 
> Cisco)  wrote:
> 
> Open source, free to use in production? Any License constraints, Please let 
> me know.
>  
> I experimented with it yesterday, really liked it.
>  
> 
>  
> Harika Vangapelli
> Engineer - IT
> hvang...@cisco.com 
> Tel:
> Cisco Systems, Inc.
> 
> 
> 
> United States
> cisco.com  
> Think before you print.
> This email may contain confidential and privileged material for the sole use 
> of the intended recipient. Any review, use, distribution or disclosure by 
> others is strictly prohibited. If you are not the intended recipient (or 
> authorized to receive for the recipient), please contact the sender by reply 
> email and delete all copies of this message.
> Please click here 
>  for 
> Company Registration Information.
>  
> From: Jon Haddad [mailto:jonathan.had...@gmail.com 
> ] On Behalf Of Jon Haddad
> Sent: Tuesday, November 14, 2017 2:18 PM
> To: user >
> Subject: Reaper 1.0
>  
> We’re excited to announce the release of the 1.0 version of Reaper for Apache 
> Cassandra!  We’ve made a lot of improvements to the flexibility of managing 
> repairs and simplified the UI based on feedback we’ve received. 
>  
> We’ve written a blog post discussing the changes in detail here: 
> http://thelastpickle.com/blog/2017/11/14/reaper-10-announcement.html 
> 
>  
> We also have a new YouTube video to help folks get up and running quickly: 
> https://www.youtube.com/watch?v=YKJRRFa22T4 
> 
>  
> The reaper site has all the docs should you have any questions: 
> http://cassandra-reaper.io/ 
>  
> Thanks all,
> Jon 



RE: Reaper 1.0

2017-11-15 Thread Harika Vangapelli -T (hvangape - AKRAYA INC at Cisco)
Open source, free to use in production? Any License constraints, Please let me 
know.

I experimented with it yesterday, really liked it.

[http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]



Harika Vangapelli
Engineer - IT
hvang...@cisco.com
Tel:

Cisco Systems, Inc.



United States
cisco.com


[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think before you 
print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
Please click 
here for 
Company Registration Information.


From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Tuesday, November 14, 2017 2:18 PM
To: user 
Subject: Reaper 1.0

We’re excited to announce the release of the 1.0 version of Reaper for Apache 
Cassandra!  We’ve made a lot of improvements to the flexibility of managing 
repairs and simplified the UI based on feedback we’ve received.

We’ve written a blog post discussing the changes in detail here: 
http://thelastpickle.com/blog/2017/11/14/reaper-10-announcement.html

We also have a new YouTube video to help folks get up and running quickly: 
https://www.youtube.com/watch?v=YKJRRFa22T4

The reaper site has all the docs should you have any questions: 
http://cassandra-reaper.io/

Thanks all,
Jon


Re: TWCS decommission and cleanups

2017-11-15 Thread Jeff Jirsa
It does the right thing - sstables sent to other nodes maintain their min/max 
timestamps so they’ll go to the right buckets

The bucket is selected using the timestamp of the newest cell in the sstable

If you run a major compaction, you would undo the same bucketing 

Cleanup works by compacting an sstable with itself and excluding any partition 
no longer owned - it makes a new sstable, the original is deleted when the new 
one is finished


-- 
Jeff Jirsa
 

> On Nov 15, 2017, at 7:57 AM, Benjamin Heiskell  wrote:
> 
> Hello all,
> 
> How does TWCS work when decommissioning a node? Does the data distribute 
> across the other nodes to the current time window's sstable (like 
> read-repairs)? Or will it compact into the sstables for the prior windows? If 
> that's how it works, how does it decide what sstable to compact with? E.g.., 
> if I ran a major compaction on one node (pushing everything into a few 
> sstables), then decommissioned it, what would you expect to see on other 
> nodes? Would the data distribute across all sstables based on the row's 
> timestamp?
> 
> I'm also a little unclear on how cleanups work. If I run a nodetool cleanup, 
> will that actually re-write the old, but not completely expired sstables of 
> prior time windows? Or are those sstables truly immutable?
> 
> I've read through the documentation 
> (http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
>  and a few other resources 
> (http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html), but I still don't 
> have a clear understanding of how it works.
> 
> Any clarification would be greatly appreciated. Thanks!
> Ben


CQL Map vs clustering keys

2017-11-15 Thread eugene miretsky
Hi,

What would be the tradeoffs between using

1) Map

(

id UUID PRIMARY KEY,

myMap map

);

2) Clustering key

(

 id UUID PRIMARY KEY,

key int,

val text,

PRIMARY KEY (id, key))

);

My understanding is that maps are stored very similarly to clustering
columns, where the map key is part of the SSTable's column name. The main
difference seems to be that with maps all the key/value pairs get retrieved
together, while with clustering keys we can retrieve individual rows, or a
range of keys.

Cheers,
Eugene


TWCS decommission and cleanups

2017-11-15 Thread Benjamin Heiskell
Hello all,

How does TWCS work when decommissioning a node? Does the data distribute
across the other nodes to the current time window's sstable (like
read-repairs)? Or will it compact into the sstables for the prior windows?
If that's how it works, how does it decide what sstable to compact with?
E.g.., if I ran a major compaction on one node (pushing everything into a
few sstables), then decommissioned it, what would you expect to see on
other nodes? Would the data distribute across all sstables based on the
row's timestamp?

I'm also a little unclear on how cleanups work. If I run a nodetool
cleanup, will that actually re-write the old, but not completely expired
sstables of prior time windows? Or are those sstables truly immutable?

I've read through the documentation (
http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
and a few other resources (
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html), but I still
don't have a clear understanding of how it works.

Any clarification would be greatly appreciated. Thanks!
Ben


Re: Node Failure Scenario

2017-11-15 Thread Anshu Vajpayee
Thank you Jonathan and all.

On Tue, Nov 14, 2017 at 10:53 PM, Jonathan Haddad  wrote:

> Anthony’s suggestions using replace_address_first_boot lets you avoid that
> requirement, and it’s specifically why it was added in 2.2.
> On Tue, Nov 14, 2017 at 1:02 AM Anshu Vajpayee 
> wrote:
>
>> ​Thanks  guys ,
>>
>> I thikn better to pass replace_address on command line rather than update
>> the cassndra-env file so that there would not be requirement to  remove it
>> later.
>> ​
>>
>> On Tue, Nov 14, 2017 at 6:32 AM, Anthony Grasso > > wrote:
>>
>>> Hi Anshu,
>>>
>>> To add to Erick's comment, remember to remove the *replace_address* method
>>> from the *cassandra-env.sh* file once the node has rejoined
>>> successfully. The node will fail the next restart otherwise.
>>>
>>> Alternatively, use the *replace_address_first_boot* method which works
>>> exactly the same way as *replace_address* the only difference is there
>>> is no need to remove it from the *cassandra-env.sh* file.
>>>
>>> Kind regards,
>>> Anthony
>>>
>>> On 13 November 2017 at 14:59, Erick Ramirez 
>>> wrote:
>>>
 Use the replace_address method with its own IP address. Make sure you
 delete the contents of the following directories:
 - data/
 - commitlog/
 - saved_caches/

 Forget rejoining with repair -- it will just cause more problems.
 Cheers!

 On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee <
 anshu.vajpa...@gmail.com> wrote:

> Hi All ,
>
> There was a node failure in one of production cluster due to disk
> failure.  After h/w recovery that node is noew ready be part of cluster,
> but it doesn't has any data due to disk crash.
>
>
>
> I can think of following option :
>
>
>
> 1. replace the node with same. using replace_address
>
> 2. Set bootstrap=false , start the node and run the repair to stream
> the data.
>
>
>
> Please suggest if both option are good and which is  best as per your
> experience. This is live production cluster.
>
>
> Thanks,
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>

>>>
>>
>>
>> --
>> *C*heers,*
>> *Anshu V*
>>
>>
>>


-- 
*C*heers,*
*Anshu V*


Re: High IO Util using TimeWindowCompaction

2017-11-15 Thread Alexander Dejanovski
Hi Kurt,

it seems highly unlikely that TWCS is responsible for your problems since
you're throttling compaction way below what i3 instances can provide.
For such instances, we would advise to use 8 concurrent compactors with
high compaction throughput (>200MB/s, if not unthrottled).

We've had reports and observed some inconsistent I/O behaviors with some i3
instances (not much lately though), so it could be what's biting you.
It would be helpful to provide a bit more info here to troubleshoot this :

   - The output of the following command during one of the 100% util phase
   : iostat -dmx 2 50
   - The output of : nodetool tablehistograms prod_dedupe event_hashes
   - The output of the following command during one of the 100% util phase
   : nodetool compactionstats -H
   - The output of : nodetool tpstats


Since you have very tiny partitions, we would advise to lower or disable
readahead, but you're not performing reads anyway on that cluster.

It would be good to check how 3.11 with TWCS performs on the same hardware
as the 3.7 cluster (c3.4xl) to narrow down the suspect list. Any chance you
can test this ?
Also, which OS are you using on the i3 instances ?

Thanks



On Mon, Nov 13, 2017 at 11:51 PM Kurtis Norwood  wrote:

> I've been testing out cassandra 3.11 (currently using 3.7) and have been
> observing really high io util occasionally that sometimes results in
> temporary flatlining at 100% io util for an extended period. I think my use
> case is pretty simple and currently only testing part of it on this new
> version so looking for advice on what might be going wrong.
>
> Use Case: I am using cassandra as basically a large "set", my table schema
> is incredibly simple, just a primary key. Records are all written with the
> same TTL (7 days). Only queries are inserting a key (which we expect to
> only happen once) and checking whether that key exists in the table. In my
> 3.7 cluster I am using DateTieredCompaction and running on c3.4xlarge (x30)
> in AWS. I've been experimenting with i3.4xlarge and wanted to also try
> TimeWindowCompaction to see if we could get better performance when adding
> machines to the cluster, that was always a really painful experience in 3.7
> with DateTieredCompaction and the docs say TimeWindowCompaction is ideal
> for my use case.
>
> Right now I am running a new cluster with 3.11 and TimeWindowCompaction
> alongside the old cluster and doing writes to both. Only reads go to the
> old cluster while I go through this preliminary testing. So the 3.11
> cluster receives between 90K to 150K writes/second and no reads. This
> morning for a period of about 30 minutes the cluster was at 100% ioutil and
> eventually recovered from this state. At that time it was only receiving
> ~100K writes/second. I don't see anything interesting in the logs that
> indicate what is going on, and I don't think a sudden compaction is the
> issue since I have limits on compaction throughput.
>
> Staying on 3.7 would be a major bummer so looking for advice.
>
> Some information that might be useful:
>
> compaction throughput - 16MB/s
> concurrent compactors - 4
> machine type - i3.4xlarge (x20)
> disk - RAID0 across 2 NVMe SSDs
>
> Table Schema looks like this:
>
> CREATE TABLE prod_dedupe.event_hashes (
>
> app int,
>
> hash_value blob,
>
> PRIMARY KEY ((app, hash_value))
>
> ) WITH bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = 'For deduping'
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
> 'max_threshold': '64', 'min_threshold': '4'}
>
> AND compression = {'chunk_length_in_kb': '4', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.0
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 3600
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = 'NONE';
>
>
> Thanks,
> Kurt
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Repair failing after it was interrupted once

2017-11-15 Thread Dipan Shah
Hello,


I was running a "nodetool repair -pr" command on one node and due to some 
network issues I lost connection to the server.


Now when I am running the same command on that and other servers too, the 
repair job if failing with the following log:


[2017-11-15 03:55:19,965] Some repair failed
[2017-11-15 03:55:19,965] Repair command #1 finished in 0 seconds
error: Repair job has failed with the error message: [2017-11-15 03:55:19,965] 
Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-11-15 03:55:19,965] Some repair failed
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:116)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

I found a few JIRA issues related to this but they were marked as fixed so I am 
not really sure if this is a bug. I am running Cassandra V 3.11.0.


One stackoverflow post suggested that I should restart all nodes and that seems 
to be overkill.


Can someone please guide me through this?


Thanks,

Dipan Shah