Re: Handle Leap Seconds with Cassandra

2016-11-02 Thread Ben Bromhead
Based on most of what I've said previously pretty much most ways of
avoiding your ordering issue of the leap second is going to be a "hack" and
there will be some amount of hope involved.

If the updates occur more than 300ms apart and you are confident your nodes
have clocks that are within 150ms of each other, then I'd close my eyes and
hope they all leap second at the same time within that 150ms.

If they are less then 300ms (I'm guessing you meant less 300ms), then I
would look to figure out what the smallest gap is between those two updates
and make sure your nodes clocks are close enough in that gap that the leap
second will occur on all nodes within that gap.

If that's not good enough, you could just halt those scenarios for 2
seconds over the leap second and then resume them once you've confirmed all
clocks have skipped.


On Wed, 2 Nov 2016 at 18:13 Anuj Wadehra  wrote:

> Thanks Ben for taking out time for the detailed reply !!
>
> We dont need strict ordering for all operations but we are looking for
> scenarios where 2 quick updates to same column of same row are possible. By
> quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in
> some blogs in your link) should give fair relative accuracy between the
> Cassandra nodes. But leap second takes the clock back for an ENTIRE one
> sec (huge) and the probability of old write overwriting the new one
> increases drastically. So, we want to be proactive with things.
>
> I agree that you should avoid such scebaruos with design (if possible).
>
> Good to know that you guys have setup your own NTP servers as per the
> recommendation. Curious..Do you also do some monitoring around NTP?
>
>
>
> Thanks
> Anuj
>
> On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead
>
>  wrote:
> If you need guaranteed strict ordering in a distributed system, I would
> not use Cassandra, Cassandra does not provide this out of the box. I would
> look to a system that uses lamport or vector clocks. Based on your
> description of how your systems runs at the moment (and how close your
> updates are together), you have either already experienced out of order
> updates or there is a real possibility you will in the future.
>
> Sorry to be so dire, but if you do require causal consistency / strict
> ordering, you are not getting it at the moment. Distributed systems theory
> is really tricky, even for people that are "experts" on distributed systems
> over unreliable networks (I would certainly not put myself in that
> category). People have made a very good name for themselves by showing that
> the vast majority of distributed databases have had bugs when it comes to
> their various consistency models and the claims these databases make.
>
> So make sure you really do need guaranteed causal consistency/strict
> ordering or if you can design around it (e.g. using conflict free
> replicated data types) or choose a system that is designed to provide it.
>
> Having said that... here are some hacky things you could do in Cassandra
> to try and get this behaviour, which I in no way endorse doing :)
>
>- Cassandra counters do leverage a logical clock per shard and you
>could hack something together with counters and lightweight transactions,
>but you would want to do your homework on counters accuracy during before
>diving into it... as I don't know if the implementation is safe in the
>context of your question. Also this would probably require a significant
>rework of your application plus a significant performance hit. I would
>invite a counter guru to jump in here...
>
>
>- You can leverage the fact that timestamps are monotonic if you
>isolate writes to a single node for a single shared... but you then loose
>Cassandra's availability guarantees, e.g. a keyspace with an RF of 1 and a
>CL of > ONE will get monotonic timestamps (if generated on the server
>side).
>
>
>- Continuing down the path of isolating writes to a single node for a
>given shard you could also isolate writes to the primary replica using your
>client driver during the leap second (make it a minute either side of the
>leap), but again you lose out on availability and you are probably already
>experiencing out of ordered writes given how close your writes and updates
>are.
>
>
> A note on NTP: NTP is generally fine if you use it to keep the clocks
> synced between the Cassandra nodes. If you are interested in how we have
> implemented NTP at Instaclustr, see our blogpost on it
> https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
> .
>
>
>
> Ben
>
>
> On Thu, 27 Oct 2016 at 10:18 Anuj Wadehra  wrote:
>
> Hi Ben,
>
> Thanks for your reply. We dont use timestamps in primary key. We rely on
> server side timestamps generated by coordinator. So, no functions at
> client side would help.
>
> Yes, drifts can create problems too. But even if you 

Re: Rebuilding with vnodes

2016-11-02 Thread kurt Greaves
If the network and both DC's can handle the load it's fine (the new DC
would . You'll want to keep an eye on the logs for streaming failures, as
it's not always completely clear and you could end up with missing data.
You should definitely be aware that rebuilds affect the source DC, so if
it's under load you want to be careful of impacting it.

I'm not sure that memtable_cleanup_threshold affects streamed SSTables,
seems unlikely that the streamed SSTables would also be added to memtables,
however obviously your DC would be receiving writes simultaneously. 0.7
seems quite high, what are your heap settings and memtable_flush_writers?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 2 November 2016 at 19:59, Anubhav Kale 
wrote:

> Hello,
>
>
>
> I am trying to rebuild a new Data Center with 50 Nodes, and expect 1 TB /
> node. Nodes are backed by SSDs, and the rebuild is happening from another
> DC in same physical region. This is with 2.1.13.
>
>
>
> I am doing this with stream_throughput=200 MB, concurrent_compactors=256,
> compactionthroughput=0, and memtable_cleanup_threshold=0.7. (memtable
> setting was necessary to keep # SSTable files in check) and running rebuild
> 20 nodes at a time.
>
>
>
> Have people generally attempted to do such large rebuilds ? Any tips ?
>
>
>
> Thanks !
>
>
>


Re: Handle Leap Seconds with Cassandra

2016-11-02 Thread Anuj Wadehra
Thanks Ben for taking out time for the detailed reply !!
We dont need strict ordering for all operations but we are looking for 
scenarios where 2 quick updates to same column of same row are possible. By 
quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in some 
blogs in your link) should give fair relative accuracy between the Cassandra 
nodes. But leap second takes the clock back for an ENTIRE one sec (huge) and 
the probability of old write overwriting the new one increases drastically. So, 
we want to be proactive with things.
I agree that you should avoid such scebaruos with design (if possible).
Good to know that you guys have setup your own NTP servers as per the 
recommendation. Curious..Do you also do some monitoring around NTP?


Thanks
Anuj 
 
 On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead wrote:  
If you need guaranteed strict ordering in a distributed system, I would not use 
Cassandra, Cassandra does not provide this out of the box. I would look to a 
system that uses lamport or vector clocks. Based on your description of how 
your systems runs at the moment (and how close your updates are together), you 
have either already experienced out of order updates or there is a real 
possibility you will in the future. 
Sorry to be so dire, but if you do require causal consistency / strict 
ordering, you are not getting it at the moment. Distributed systems theory is 
really tricky, even for people that are "experts" on distributed systems over 
unreliable networks (I would certainly not put myself in that category). People 
have made a very good name for themselves by showing that the vast majority of 
distributed databases have had bugs when it comes to their various consistency 
models and the claims these databases make.
So make sure you really do need guaranteed causal consistency/strict ordering 
or if you can design around it (e.g. using conflict free replicated data types) 
or choose a system that is designed to provide it.
Having said that... here are some hacky things you could do in Cassandra to try 
and get this behaviour, which I in no way endorse doing :)    
   - Cassandra counters do leverage a logical clock per shard and you could 
hack something together with counters and lightweight transactions, but you 
would want to do your homework on counters accuracy during before diving into 
it... as I don't know if the implementation is safe in the context of your 
question. Also this would probably require a significant rework of your 
application plus a significant performance hit. I would invite a counter guru 
to jump in here... 
   
   - You can leverage the fact that timestamps are monotonic if you isolate 
writes to a single node for a single shared... but you then loose Cassandra's 
availability guarantees, e.g. a keyspace with an RF of 1 and a CL of > ONE will 
get monotonic timestamps (if generated on the server side). 
   
   - Continuing down the path of isolating writes to a single node for a given 
shard you could also isolate writes to the primary replica using your client 
driver during the leap second (make it a minute either side of the leap), but 
again you lose out on availability and you are probably already experiencing 
out of ordered writes given how close your writes and updates are.   


A note on NTP: NTP is generally fine if you use it to keep the clocks synced 
between the Cassandra nodes. If you are interested in how we have implemented 
NTP at Instaclustr, see our blogpost on it 
https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/.


Ben  

On Thu, 27 Oct 2016 at 10:18 Anuj Wadehra  wrote:

Hi Ben,
Thanks for your reply. We dont use timestamps in primary key. We rely on server 
side timestamps generated by coordinator. So, no functions at client side would 
help. 
Yes, drifts can create problems too. But even if you ensure that nodes are 
perfectly synced with NTP, you will surely mess up the order of updates during 
the leap second(interleaving). Some applications update same column of same row 
quickly (within a second ) and reversing the order would corrupt the data.
I am interested in learning how people relying on strict order of updates 
handle leap second scenario when clock goes back one second(same second is 
repeated). What kind of tricks people use  to ensure that server side 
timestamps are monotonic ?
As per my understanding NTP slew mode may not be suitable for Cassandra as it 
may cause unpredictable drift amongst the Cassandra nodes. Ideas ?? 

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Thu, 20 Oct, 2016 at 11:25 PM, Ben Bromhead

 wrote:   
http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a pretty 
good overview

If you are using a timestamp as part of your primary key, this is the situation 
where you could end up overwriting data. I would suggest using timeuuid instead 
which will ensure that 

Re: Backup restore with a different name

2016-11-02 Thread Jens Rantil
Bryan,

On Wed, Nov 2, 2016 at 11:38 AM, Bryan Cheng  wrote:

> do you mean restoring the cluster to that state, or just exposing that
> state for reference while keeping the (corrupt) current state in the live
> cluster?


I mean "exposing that state for reference while keeping the (corrupt)
current state in the live cluster".

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Backup restore with a different name

2016-11-02 Thread Jens Rantil
Thanks Anubhav,

Looks like a Java project without any documentation whatsoever ;) How do I
use the tool? What does it do?

Cheers,
Jens

On Wed, Nov 2, 2016 at 11:36 AM, Anubhav Kale 
wrote:

> You would have to build some logic on top of what’s natively supported.
>
>
>
> Here is an option: https://github.com/anubhavkale/CassandraTools/
> tree/master/BackupRestore
>
>
>
>
>
> *From:* Jens Rantil [mailto:jens.ran...@tink.se]
> *Sent:* Wednesday, November 2, 2016 2:21 PM
> *To:* Cassandra Group 
> *Subject:* Backup restore with a different name
>
>
>
> Hi,
>
>
>
> Let's say I am periodically making snapshots of a table, say "users", for
> backup purposes. Let's say a developer makes a mistake and corrupts the
> table. Is there an easy way for me to restore a replica, say
> "users_20161102", of the original table for the developer to looks at the
> old copy?
>
>
>
> Cheers,
>
> Jens
>
>
>
> --
>
> Jens Rantil
>
> Backend engineer
>
> Tink AB
>
>
>
> Email: jens.ran...@tink.se
>
> Phone: +46 708 84 18 32
>
> Web: www.tink.se
> 
>
>
>
> Facebook
> 
>  Linkedin
> 
>  Twitter
> 
>



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Backup restore with a different name

2016-11-02 Thread Bryan Cheng
Hi Jens,

When you refer to restoring a snapshot for a developer to look at, do you
mean restoring the cluster to that state, or just exposing that state for
reference while keeping the (corrupt) current state in the live cluster?

You may find these useful:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

Additionally, AFAIK the snapshot files are just SSTables, so any utility
that can examine them (for example, sstable2json) should work on those
files as well.

On Wed, Nov 2, 2016 at 2:20 PM, Jens Rantil  wrote:

> Hi,
>
> Let's say I am periodically making snapshots of a table, say "users", for
> backup purposes. Let's say a developer makes a mistake and corrupts the
> table. Is there an easy way for me to restore a replica, say
> "users_20161102", of the original table for the developer to looks at the
> old copy?
>
> Cheers,
> Jens
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>


RE: Backup restore with a different name

2016-11-02 Thread Anubhav Kale
You would have to build some logic on top of what’s natively supported.

Here is an option: 
https://github.com/anubhavkale/CassandraTools/tree/master/BackupRestore


From: Jens Rantil [mailto:jens.ran...@tink.se]
Sent: Wednesday, November 2, 2016 2:21 PM
To: Cassandra Group 
Subject: Backup restore with a different name

Hi,

Let's say I am periodically making snapshots of a table, say "users", for 
backup purposes. Let's say a developer makes a mistake and corrupts the table. 
Is there an easy way for me to restore a replica, say "users_20161102", of the 
original table for the developer to looks at the old copy?

Cheers,
Jens

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: 
www.tink.se

Facebook
 
Linkedin
 
Twitter


Backup restore with a different name

2016-11-02 Thread Jens Rantil
Hi,

Let's say I am periodically making snapshots of a table, say "users", for
backup purposes. Let's say a developer makes a mistake and corrupts the
table. Is there an easy way for me to restore a replica, say
"users_20161102", of the original table for the developer to looks at the
old copy?

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Cassandra Poor Read Performance Response Time

2016-11-02 Thread Jens Rantil
Hi,

I am by no means an expert on Cassandra, nor on
DateTieredCompactionStrategy. However, looking in "Query 2.xlsx" I see a
lot of

Partition index with 0 entries found for sstable 186

To me, that looks like Cassandra is looking at a lot of sstables and
realize too late that they don't contain any relevant data. Are you using
TTLs when you write data? Do the TTLs vary? If they do, there's a risk
Cassandra will have to inspect a lot of tables that turns out to hold
expired data. Also, have you checked `nodetool cfstats` and bloom filter
false positives?

Does `nodetool cfhistograms` give you any insights? I'm mostly thinking in
terms of unbalanced partition keys.

Have you checked the logs for how long GC pauses are being taken?

Somewhat implementation specific: Would adjusting the time bucket to a
smaller time resolution be an option?

Also, since you are using DateTieredCompactionStrategy, have you considered
using a TIMESTAMP constraint[1]? That might help you a lot actually.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5514

Cheers,
Jens

On Mon, Oct 31, 2016 at 11:10 PM, _ _  wrote:

> Hi
>
> Currently i am running a cassandra cluster of 3 nodes (with it replicating
> to both nodes) and am experiencing poor performance, usually getting second
> response times when running queries when i am expecting/needing millisecond
> response times. Currently i have a table which looks like:
>
> CREATE TABLE tracker.all_ad_impressions_counter_1d (
> time_bucket bigint,
> ad_id text,
> uc text,
> count counter,
> PRIMARY KEY ((time_bucket, ad_id), uc)
> ) WITH CLUSTERING ORDER BY (uc ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'base_time_seconds': '3600', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_sstable_age_days': '30', 'max_threshold': '32', 'min_threshold': '4',
> 'timestamp_resolution': 'MILLISECONDS'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
>
> and queries which look like:
>
> SELECT
> time_bucket,
> uc,
> count
> FROM
> all_ad_impressions_counter_1d
>
> WHERE ad_id = ?
> AND time_bucket = ?
>
> the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3
> 100GB datastores, the storage is not local and these VMs are being managed
> through openstack. There are roughly 200 million records being written per
> day (1 time_bucket) and maybe a few thousand records per partition
> (time_bucket, ad_id) at most. The amount of writes is not having a
> significant effect on our read performance as when writes are stopped, the
> read response time does not improve noticeably. I have attached a trace of
> one query i ran which took around 3 seconds which i would expect to take
> well below a second. I have also included the cassandra.yaml file and jvm
> options file. We do intend to change the storage to local storage and
> expect this will have a significant impact but i was wondering if there's
> anything else which could be changed which will also have a significant
> impact on read performance?
>
> Thanks
> Ian
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Ben Bromhead
We are not publishing the build artefacts for our LTS at the moment as we
don't test them on the different distros (debian/ubuntu, centos etc). If
anyone wishes to do so feel free to create a PR and submit them!

On Wed, 2 Nov 2016 at 11:37 Jesse Hodges  wrote:

> awesome, thanks for the tip!
>
> -Jesse
>
> On Wed, Nov 2, 2016 at 12:39 PM, Benjamin Roth 
> wrote:
>
> You can build one on your own very easily. Just check out the desired git
> repo and do this:
>
>
> http://stackoverflow.com/questions/8989192/how-to-package-the-cassandra-source-code-into-debian-package
>
> 2016-11-02 17:35 GMT+01:00 Jesse Hodges :
>
> Just curious, has anybody created a debian package for this?
>
> Thanks, Jesse
>
> On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang  wrote:
>
> This is awesome! Stability is the king.
>
> Thank you so much!
>
> On Oct 19, 2016 2:56 PM, "Ben Bromhead"  wrote:
>
> Hi All
>
> I am proud to announce we are making available our production build of
> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
> customers). Our release of Cassandra 3.7 includes a number of backported
> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
> include the new features of these releases.
>
> You can find our release of Cassandra 3.7 LTS on github here (
> https://github.com/instaclustr/cassandra). You can read more of our
> thinking and how this applies to our managed service here (
> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>
> We also have an expanded FAQ about why and how we are approaching 3.x in
> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
> however I've included the top few question and answers below:
>
> *Is this a fork?*
> No, This is just Cassandra with a different release cadence for those who
> want 3.x features but are slightly more risk averse than the current
> schedule allows.
>
> *Why not just use the official release?*
> With the 3.x tick-tock branch we have encountered more instability than
> with the previous release cadence. We feel that releasing new features
> every other release makes it very hard for operators to stabilize their
> production environment without bringing in brand new features that are not
> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
> bug fix branch included new and real-world untested features, specifically
> CDC. We have decided to stick with Cassandra 3.7 and instead backport
> critical issues and maintain it ourselves rather than trying to stick with
> the current Apache Cassandra release cadence.
>
> *Why backport?*
> At Instaclustr we support and run a number of different versions of Apache
> Cassandra on behalf of our customers. Over the course of managing Cassandra
> for our customers we often encounter bugs. There are existing patches for
> some of them, others we patch ourselves. Generally, if we can, we try to
> wait for the next official Apache Cassandra release, however in the need to
> ensure our customers remain stable and running we will sometimes backport
> bugs and write our own hotfixes (which are also submitted back to the
> community).
>
> *Why release it?*
> A number of our customers and people in the community have asked if we
> would make this available, which we are more than happy to do so. This
> repository represents what Instaclustr runs in production for Cassandra 3.7
> and this is our way of helping the community get a similar level of
> stability as what you would get from our managed service.
>
> Cheers
>
> Ben
>
>
>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
> --
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer


Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Jesse Hodges
awesome, thanks for the tip!

-Jesse

On Wed, Nov 2, 2016 at 12:39 PM, Benjamin Roth 
wrote:

> You can build one on your own very easily. Just check out the desired git
> repo and do this:
>
> http://stackoverflow.com/questions/8989192/how-to-
> package-the-cassandra-source-code-into-debian-package
>
> 2016-11-02 17:35 GMT+01:00 Jesse Hodges :
>
>> Just curious, has anybody created a debian package for this?
>>
>> Thanks, Jesse
>>
>> On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang  wrote:
>>
>>> This is awesome! Stability is the king.
>>>
>>> Thank you so much!
>>>
>>> On Oct 19, 2016 2:56 PM, "Ben Bromhead"  wrote:
>>>
 Hi All

 I am proud to announce we are making available our production build of
 Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
 customers). Our release of Cassandra 3.7 includes a number of backported
 patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
 include the new features of these releases.

 You can find our release of Cassandra 3.7 LTS on github here (
 https://github.com/instaclustr/cassandra). You can read more of our
 thinking and how this applies to our managed service here (
 https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).

 We also have an expanded FAQ about why and how we are approaching 3.x
 in this manner (https://github.com/instaclust
 r/cassandra#cassandra-37-lts), however I've included the top few
 question and answers below:

 *Is this a fork?*
 No, This is just Cassandra with a different release cadence for those
 who want 3.x features but are slightly more risk averse than the current
 schedule allows.

 *Why not just use the official release?*
 With the 3.x tick-tock branch we have encountered more instability than
 with the previous release cadence. We feel that releasing new features
 every other release makes it very hard for operators to stabilize their
 production environment without bringing in brand new features that are not
 battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
 bug fix branch included new and real-world untested features, specifically
 CDC. We have decided to stick with Cassandra 3.7 and instead backport
 critical issues and maintain it ourselves rather than trying to stick with
 the current Apache Cassandra release cadence.

 *Why backport?*
 At Instaclustr we support and run a number of different versions of
 Apache Cassandra on behalf of our customers. Over the course of managing
 Cassandra for our customers we often encounter bugs. There are existing
 patches for some of them, others we patch ourselves. Generally, if we can,
 we try to wait for the next official Apache Cassandra release, however in
 the need to ensure our customers remain stable and running we will
 sometimes backport bugs and write our own hotfixes (which are also
 submitted back to the community).

 *Why release it?*
 A number of our customers and people in the community have asked if we
 would make this available, which we are more than happy to do so. This
 repository represents what Instaclustr runs in production for Cassandra 3.7
 and this is our way of helping the community get a similar level of
 stability as what you would get from our managed service.

 Cheers

 Ben



 --
 Ben Bromhead
 CTO | Instaclustr 
 +1 650 284 9692
 Managed Cassandra / Spark on AWS, Azure and Softlayer

>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: failing bootstraps with OOM

2016-11-02 Thread Vladimir Yudovin
Hi,



probably you can try to start new node with auto_bootstrap: false and then 
repair keypaces or even tables one by one with nodetool repair 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 02 Nov 2016 10:35:45 -0400Mike Torra mto...@demandware.com 
wrote 




Hi All -



I am trying to bootstrap a replacement node in a cluster, but it consistently 
fails to bootstrap because of OOM exceptions. For almost a week I've been going 
through cycles of bootstrapping, finding errors, then restarting / resuming 
bootstrap, and I am struggling to move forward. Sometimes the bootstrapping 
node itself fails, which usually manifests first as very high GC times 
(sometimes 30s+!), then nodetool commands start to fail with timeouts, then the 
node will crash with an OOM exception. Other times, a node streaming data to 
this bootstrapping node will have a similar failure. In either case, when it 
happens I need to restart the crashed node, then resume the bootstrap.



On top of these issues, when I do need to restart a node it takes a lng 
time 
(http://stackoverflow.com/questions/40141739/why-does-cassandra-sometimes-take-a-hours-to-start).
 This exasperates the problem because it takes so long to find out if a change 
to the cluster helps or if it still fails. I am in the process of upgrading all 
nodes in the cluster from m4.xlarge to c4.4xlarge, and I am running Cassandra 
DDC 3.5 on all nodes. The cluster has 26 nodes spread across 4 regions in EC2. 
Here is some other relevant cluster info (also in stack overflow post):



Cluster Info

Cassandra DDC 3.5

EC2MultiRegionSnitch

m4.xlarge, moving to c4.4xlarge

Schema Info

3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default)

no secondary indexes

I am unsure what to try next. The node that is currently having this bootstrap 
problem is a pretty beefy box, with 16 cores, 30G of ram, and a 3.2T EBS 
volume. The slow startup time might be because of the issues with a high number 
of SSTables that Jeff Jirsa mentioned in a comment on the SO post, but I am at 
a loss for the OOM issues. I've tried:


Changing from CMS to G1 GC, which seemed to have helped a bit

Upgrading from 3.5 to 3.9, which did not seem to help

Upgrading instance types from m4.xlarge to c4.4xlarge, which seems to help, but 
I'm still having issues

I'd appreciate any suggestions on what else I can try to track down the cause 
of these OOM exceptions.



- Mike








Re: Custom receiver for WebSocket in Spark not working

2016-11-02 Thread Kant Kodali
I don't see a store() call in your receive().

Search for store() in here
http://spark.apache.org/docs/latest/streaming-custom-receivers.html

On Wed, Nov 2, 2016 at 10:23 AM, Cassa L  wrote:

> Hi,
> I am using spark 1.6. I wrote a custom receiver to read from WebSocket.
> But when I start my spark job, it  connects to the WebSocket but  doesn't
> get any message. Same code, if I write as separate scala class, it works
> and prints messages from WebSocket. Is anything missing in my Spark Code?
> There are no errors in spark console.
>
> Here is my receiver -
>
> import org.apache.spark.Logging
> import org.apache.spark.storage.StorageLevel
> import org.apache.spark.streaming.receiver.Receiver
> import org.jfarcand.wcs.{MessageListener, TextListener, WebSocket}
>
> /**
>   * Custom receiver for WebSocket
>   */
> class WebSocketReceiver extends Receiver[String](StorageLevel.MEMORY_ONLY) 
> with Runnable with Logging {
>
>   private var webSocket: WebSocket = _
>
>   @transient
>   private var thread: Thread = _
>
>   override def onStart(): Unit = {
> thread = new Thread(this)
> thread.start()
>   }
>
>   override def onStop(): Unit = {
> setWebSocket(null)
> thread.interrupt()
>   }
>
>   override def run(): Unit = {
> println("Received ")
> receive()
>   }
>
>   private def receive(): Unit = {
>
>
> val connection = WebSocket().open("ws://localhost:3001")
> println("WebSocket  Connected ..." )
> println("Connected --- " + connection)
> setWebSocket(connection)
>
>connection.listener(new TextListener {
>
>  override def onMessage(message: String) {
>  System.out.println("Message in Spark client is --> " + 
> message)
>}
> })
>
>
> }
>
> private def setWebSocket(newWebSocket: WebSocket) = synchronized {
> if (webSocket != null) {
> webSocket.shutDown
> }
> webSocket = newWebSocket
> }
>
> }
>
>
> =
>
> Here is code for Spark job
>
>
> object WebSocketTestApp {
>
>   def main(args: Array[String]) {
> val conf = new SparkConf()
>   .setAppName("Test Web Socket")
>   .setMaster("local[20]")
>   .set("test", "")
> val ssc = new StreamingContext(conf, Seconds(5))
>
>
> val stream: ReceiverInputDStream[String] = ssc.receiverStream(new 
> WebSocketReceiver())
> stream.print()
>
> ssc.start()
> ssc.awaitTermination()
>   }
>
>
> ==
> }
>
>
> Thanks,
>
> LCassa
>
>


Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Benjamin Roth
You can build one on your own very easily. Just check out the desired git
repo and do this:

http://stackoverflow.com/questions/8989192/how-to-package-the-cassandra-source-code-into-debian-package

2016-11-02 17:35 GMT+01:00 Jesse Hodges :

> Just curious, has anybody created a debian package for this?
>
> Thanks, Jesse
>
> On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang  wrote:
>
>> This is awesome! Stability is the king.
>>
>> Thank you so much!
>>
>> On Oct 19, 2016 2:56 PM, "Ben Bromhead"  wrote:
>>
>>> Hi All
>>>
>>> I am proud to announce we are making available our production build of
>>> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
>>> customers). Our release of Cassandra 3.7 includes a number of backported
>>> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
>>> include the new features of these releases.
>>>
>>> You can find our release of Cassandra 3.7 LTS on github here (
>>> https://github.com/instaclustr/cassandra). You can read more of our
>>> thinking and how this applies to our managed service here (
>>> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>>>
>>> We also have an expanded FAQ about why and how we are approaching 3.x in
>>> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
>>> however I've included the top few question and answers below:
>>>
>>> *Is this a fork?*
>>> No, This is just Cassandra with a different release cadence for those
>>> who want 3.x features but are slightly more risk averse than the current
>>> schedule allows.
>>>
>>> *Why not just use the official release?*
>>> With the 3.x tick-tock branch we have encountered more instability than
>>> with the previous release cadence. We feel that releasing new features
>>> every other release makes it very hard for operators to stabilize their
>>> production environment without bringing in brand new features that are not
>>> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
>>> bug fix branch included new and real-world untested features, specifically
>>> CDC. We have decided to stick with Cassandra 3.7 and instead backport
>>> critical issues and maintain it ourselves rather than trying to stick with
>>> the current Apache Cassandra release cadence.
>>>
>>> *Why backport?*
>>> At Instaclustr we support and run a number of different versions of
>>> Apache Cassandra on behalf of our customers. Over the course of managing
>>> Cassandra for our customers we often encounter bugs. There are existing
>>> patches for some of them, others we patch ourselves. Generally, if we can,
>>> we try to wait for the next official Apache Cassandra release, however in
>>> the need to ensure our customers remain stable and running we will
>>> sometimes backport bugs and write our own hotfixes (which are also
>>> submitted back to the community).
>>>
>>> *Why release it?*
>>> A number of our customers and people in the community have asked if we
>>> would make this available, which we are more than happy to do so. This
>>> repository represents what Instaclustr runs in production for Cassandra 3.7
>>> and this is our way of helping the community get a similar level of
>>> stability as what you would get from our managed service.
>>>
>>> Cheers
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> Ben Bromhead
>>> CTO | Instaclustr 
>>> +1 650 284 9692
>>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Custom receiver for WebSocket in Spark not working

2016-11-02 Thread Cassa L
Hi,
I am using spark 1.6. I wrote a custom receiver to read from WebSocket. But
when I start my spark job, it  connects to the WebSocket but  doesn't get
any message. Same code, if I write as separate scala class, it works and
prints messages from WebSocket. Is anything missing in my Spark Code? There
are no errors in spark console.

Here is my receiver -

import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.receiver.Receiver
import org.jfarcand.wcs.{MessageListener, TextListener, WebSocket}

/**
  * Custom receiver for WebSocket
  */
class WebSocketReceiver extends
Receiver[String](StorageLevel.MEMORY_ONLY) with Runnable with Logging
{

  private var webSocket: WebSocket = _

  @transient
  private var thread: Thread = _

  override def onStart(): Unit = {
thread = new Thread(this)
thread.start()
  }

  override def onStop(): Unit = {
setWebSocket(null)
thread.interrupt()
  }

  override def run(): Unit = {
println("Received ")
receive()
  }

  private def receive(): Unit = {


val connection = WebSocket().open("ws://localhost:3001")
println("WebSocket  Connected ..." )
println("Connected --- " + connection)
setWebSocket(connection)

   connection.listener(new TextListener {

 override def onMessage(message: String) {
 System.out.println("Message in Spark client is --> " + message)
   }
})


}

private def setWebSocket(newWebSocket: WebSocket) = synchronized {
if (webSocket != null) {
webSocket.shutDown
}
webSocket = newWebSocket
}

}


=

Here is code for Spark job


object WebSocketTestApp {

  def main(args: Array[String]) {
val conf = new SparkConf()
  .setAppName("Test Web Socket")
  .setMaster("local[20]")
  .set("test", "")
val ssc = new StreamingContext(conf, Seconds(5))


val stream: ReceiverInputDStream[String] = ssc.receiverStream(new
WebSocketReceiver())
stream.print()

ssc.start()
ssc.awaitTermination()
  }


==
}


Thanks,

LCassa


Re: failing bootstraps with OOM

2016-11-02 Thread Oleksandr Shulgin
On Wed, Nov 2, 2016 at 3:35 PM, Mike Torra  wrote:
>
> Hi All -
>
> I am trying to bootstrap a replacement node in a cluster, but it
consistently fails to bootstrap because of OOM exceptions. For almost a
week I've been going through cycles of bootstrapping, finding errors, then
restarting / resuming bootstrap, and I am struggling to move forward.
Sometimes the bootstrapping node itself fails, which usually manifests
first as very high GC times (sometimes 30s+!), then nodetool commands start
to fail with timeouts, then the node will crash with an OOM exception.
Other times, a node streaming data to this bootstrapping node will have a
similar failure. In either case, when it happens I need to restart the
crashed node, then resume the bootstrap.
>
> On top of these issues, when I do need to restart a node it takes a
lng time (
http://stackoverflow.com/questions/40141739/why-does-cassandra-sometimes-take-a-hours-to-start).
This exasperates the problem because it takes so long to find out if a
change to the cluster helps or if it still fails. I am in the process of
upgrading all nodes in the cluster from m4.xlarge to c4.4xlarge, and I am
running Cassandra DDC 3.5 on all nodes. The cluster has 26 nodes spread
across 4 regions in EC2. Here is some other relevant cluster info (also in
stack overflow post):
>
> Cluster Info
>
> Cassandra DDC 3.5
> EC2MultiRegionSnitch
> m4.xlarge, moving to c4.4xlarge
>
> Schema Info
>
> 3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default)
> no secondary indexes
>
> I am unsure what to try next. The node that is currently having this
bootstrap problem is a pretty beefy box, with 16 cores, 30G of ram, and a
3.2T EBS volume. The slow startup time might be because of the issues with
a high number of SSTables that Jeff Jirsa mentioned in a comment on the SO
post, but I am at a loss for the OOM issues. I've tried:
>
> Changing from CMS to G1 GC, which seemed to have helped a bit
> Upgrading from 3.5 to 3.9, which did not seem to help
> Upgrading instance types from m4.xlarge to c4.4xlarge, which seems to
help, but I'm still having issues
>
> I'd appreciate any suggestions on what else I can try to track down the
cause of these OOM exceptions.

Hi,

Do you monitor pending compactions and actual number of SSTable files?

On startup Cassandra needs to touch most of the data files and also seems
to keep some metadata about every relevant file in memory.  We once went
into situation where we ended up with hundreds of thousands of files per
node which resulted in OOMs on every other node of the ring, and startup
time was of over half an hour (this was on version 2.1).

If you have much more files than you expect, then you should check and
adjust your concurrent_compactors and compaction_throughput_mb_per_sec
settings.  Increase concurrent_compactors if you're behind (pending
compactions metric is a hint) and consider un-throttling compaction before
your situation is back to normal.

Cheers,
--
Alex


Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Jesse Hodges
Just curious, has anybody created a debian package for this?

Thanks, Jesse

On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang  wrote:

> This is awesome! Stability is the king.
>
> Thank you so much!
>
> On Oct 19, 2016 2:56 PM, "Ben Bromhead"  wrote:
>
>> Hi All
>>
>> I am proud to announce we are making available our production build of
>> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
>> customers). Our release of Cassandra 3.7 includes a number of backported
>> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
>> include the new features of these releases.
>>
>> You can find our release of Cassandra 3.7 LTS on github here (
>> https://github.com/instaclustr/cassandra). You can read more of our
>> thinking and how this applies to our managed service here (
>> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>>
>> We also have an expanded FAQ about why and how we are approaching 3.x in
>> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
>> however I've included the top few question and answers below:
>>
>> *Is this a fork?*
>> No, This is just Cassandra with a different release cadence for those who
>> want 3.x features but are slightly more risk averse than the current
>> schedule allows.
>>
>> *Why not just use the official release?*
>> With the 3.x tick-tock branch we have encountered more instability than
>> with the previous release cadence. We feel that releasing new features
>> every other release makes it very hard for operators to stabilize their
>> production environment without bringing in brand new features that are not
>> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
>> bug fix branch included new and real-world untested features, specifically
>> CDC. We have decided to stick with Cassandra 3.7 and instead backport
>> critical issues and maintain it ourselves rather than trying to stick with
>> the current Apache Cassandra release cadence.
>>
>> *Why backport?*
>> At Instaclustr we support and run a number of different versions of
>> Apache Cassandra on behalf of our customers. Over the course of managing
>> Cassandra for our customers we often encounter bugs. There are existing
>> patches for some of them, others we patch ourselves. Generally, if we can,
>> we try to wait for the next official Apache Cassandra release, however in
>> the need to ensure our customers remain stable and running we will
>> sometimes backport bugs and write our own hotfixes (which are also
>> submitted back to the community).
>>
>> *Why release it?*
>> A number of our customers and people in the community have asked if we
>> would make this available, which we are more than happy to do so. This
>> repository represents what Instaclustr runs in production for Cassandra 3.7
>> and this is our way of helping the community get a similar level of
>> stability as what you would get from our managed service.
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>


failing bootstraps with OOM

2016-11-02 Thread Mike Torra
Hi All -

I am trying to bootstrap a replacement node in a cluster, but it consistently 
fails to bootstrap because of OOM exceptions. For almost a week I've been going 
through cycles of bootstrapping, finding errors, then restarting / resuming 
bootstrap, and I am struggling to move forward. Sometimes the bootstrapping 
node itself fails, which usually manifests first as very high GC times 
(sometimes 30s+!), then nodetool commands start to fail with timeouts, then the 
node will crash with an OOM exception. Other times, a node streaming data to 
this bootstrapping node will have a similar failure. In either case, when it 
happens I need to restart the crashed node, then resume the bootstrap.

On top of these issues, when I do need to restart a node it takes a lng 
time 
(http://stackoverflow.com/questions/40141739/why-does-cassandra-sometimes-take-a-hours-to-start).
 This exasperates the problem because it takes so long to find out if a change 
to the cluster helps or if it still fails. I am in the process of upgrading all 
nodes in the cluster from m4.xlarge to c4.4xlarge, and I am running Cassandra 
DDC 3.5 on all nodes. The cluster has 26 nodes spread across 4 regions in EC2. 
Here is some other relevant cluster info (also in stack overflow post):

Cluster Info

  *   Cassandra DDC 3.5
  *   EC2MultiRegionSnitch
  *   m4.xlarge, moving to c4.4xlarge

Schema Info

  *   3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default)
  *   no secondary indexes

I am unsure what to try next. The node that is currently having this bootstrap 
problem is a pretty beefy box, with 16 cores, 30G of ram, and a 3.2T EBS 
volume. The slow startup time might be because of the issues with a high number 
of SSTables that Jeff Jirsa mentioned in a comment on the SO post, but I am at 
a loss for the OOM issues. I've tried:

  *   Changing from CMS to G1 GC, which seemed to have helped a bit
  *   Upgrading from 3.5 to 3.9, which did not seem to help
  *   Upgrading instance types from m4.xlarge to c4.4xlarge, which seems to 
help, but I'm still having issues

I'd appreciate any suggestions on what else I can try to track down the cause 
of these OOM exceptions.

- Mike