RE: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Durity, Sean R
What is the data problem that you are trying to solve with Cassandra? Is it high availability? Low latency queries? Large data volumes? High concurrent users? I would design the solution to fit the problem(s) you are solving. For example, if high availability is the goal, I would be very

RE: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Durity, Sean R
If you are just trying to get a sense of the data, you could try adding a limit clause to limit the amount of results and hopefully beat the timeout. However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND CLUSTER." It means the data model does not support the query and

RE: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-11 Thread Durity, Sean R
https://issues.apache.org/jira/browse/CASSANDRA-9620 has something similar that was determined to be a driver error. I would start with looking at the driver version and also the RetryPolicy that is in effect for the Cluster. Secondly, I would look at whether a batch is really needed for the

RE: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Durity, Sean R
My first suspicion would be to look at the server times in the cluster. It looks like other cases where a write occurs (with no errors) but the data is not retrieved as expected. If the write occurs with an earlier timestamp than the existing data, this is the behavior you would see. The write

RE: [EXTERNAL] Re: Garbage Collector

2019-03-19 Thread Durity, Sean R
My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB for the JVM). That has worked in just about every case I’m familiar with. In the old days we used CMS, but tuning that beast is a black art with few wizards available (though several on this mailing list). Today, I

RE: [EXTERNAL] Re: Default TTL on CF

2019-03-14 Thread Durity, Sean R
I spent a month of my life on similar problem... There wasn't an easy answer, but this is what I did #1 - Stop the problem from growing further. Get new inserts using a TTL (or set the default on the table so they get it). App team had to do this one. #2 - Delete any data that should already

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Durity, Sean R
tl set on each record as the first table? ____ From: Durity, Sean R mailto:sean_r_dur...@homedepot.com>> Sent: Wednesday, March 13, 2019 8:17 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: [EXTERNAL] Re: Migrate large volume of data from one tabl

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
Rebuild the DCs with a new number of vnodes… I have done it. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 2:09 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Cluster size "limit" Is not possible with an existing cluster! Le mer. 13 mars 2019 à 18:39, Duri

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
If you can change to 8 vnodes, it will be much better for repairs and other kinds of streaming operations. The old advice of 256 per node is now not very helpful. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 1:27 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cluster size

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Durity, Sean R
egards On Wed, 13 Mar 2019 at 06:57, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: If there are 2 access patterns, I would consider having 2 tables. The first one with the ID, which you say is the majority use case. Then have a second table that uses a time-bucket approach as

RE: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Durity, Sean R
If there are 2 access patterns, I would consider having 2 tables. The first one with the ID, which you say is the majority use case. Then have a second table that uses a time-bucket approach as others have suggested: (time bucket, id) as primary key Choose a time bucket (day, week, hour, month,

RE: [EXTERNAL] Re: A Question About Hints

2019-03-05 Thread Durity, Sean R
Versions 2.0 and 2.1 were generally very stable, so I can understand a reticence to move when there are so many other things competing for time and attention. Sean Durity From: shalom sagges Sent: Monday, March 04, 2019 4:21 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: A

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
months in different places and both times recovery was difficult and hazardous. I still strongly recommend against it. On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: We use the PropertyFileSnitch precisely because it is the same on every node. I

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
We use the PropertyFileSnitch precisely because it is the same on every node. If each node has to have a different file (for GPFS) – deployment is more complicated. (And for any automated configuration you would have a list of hosts and DC/rack information to compile anyway) I do put UNKNOWN

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Durity, Sean R
This has not been my experience. Changing IP address is one of the worst admin tasks for Cassandra. System.peers and other information on each nodes is stored by ip address. And gossip is really good at sending around the old information mixed with new… Sean Durity From: Oleksandr Shulgin

RE: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Durity, Sean R
Agreed. It’s pretty close to impossible to administrate your way out of a data model that doesn’t play to Cassandra’s strengths. Which is true for other data storage technologies – you need to model the data the way that the engine is designed to work. Sean Durity From: DuyHai Doan Sent:

RE: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Durity, Sean R
I have seen unreliable streaming (streaming that doesn’t finish) because of TCP timeouts from firewalls or switches. The default tcp_keepalive kernel parameters are usually not tuned for that. See https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html for more

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R
Kenneth is right. Trying to port/support a relational model to a CQL model the way you are doing it is not going to go well. You won’t be able to scale or get the search flexibility that you want. It will make Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Durity, Sean R
I will start – knowing that others will have additional help/questions. What heap size are you using? Sounds like you are using the CMS garbage collector. That takes some arcane knowledge and lots of testing to tune. I would start with G1 and using ½ the available RAM as the heap size. I would

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
: Dor Laor Sent: Wednesday, January 09, 2019 11:23 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: I think you could consider op

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
center. Does that cause any overhead ? On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You ca

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Durity, Sean R
I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You can also use less vnodes, only replicate certain keyspaces, etc. in order to perform the analytics more efficiently. Sean Durity

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R
You say the events are incremental updates. I am interpreting this to mean only some columns are updated. Others should keep their original values. You are correct that inserting null creates a tombstone. Can you only insert the columns that actually have new values? Just skip the columns with

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Durity, Sean R
it to Cassandra again. This way I can store almost all my data, but when the problem is the read I don't apply any Retry policy (but this is my problem) Thanks Marco Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R mailto:sean_r_dur...@homedepot.com>> ha scritto: Can you provide the

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-21 Thread Durity, Sean R
Can you provide the schema and the queries? What is the RF of the keyspace for the data? Are you using any Retry policy on your Cluster object? Sean Durity From: Marco Gasparini Sent: Friday, December 21, 2018 10:45 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Writes and Reads with

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-05 Thread Durity, Sean R
the new binary and restart the node to a newer version as quickly as possible. upgradesstables is I/O intensive and it takes time and is proportional to the data on the node. Given these constraints, is there a risk due to prolonged upgradesstables? On Tue, Dec 4, 2018 at 12:20 PM Durity, Sean R

RE: [EXTERNAL] Cassandra Upgrade Plan 2.2.4 to 3.11.3

2018-12-04 Thread Durity, Sean R
See my recent post for some additional points. But I wanted to encourage you to look at the in-place upgrade on your existing hardware. No need to add a DC to try and upgrade. The cluster will handle reads and writes with nodes of different versions – no problems. I have done this many times on

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Durity, Sean R
We have had great success with Cassandra upgrades with applications staying on-line. It is one of the strongest benefits of Cassandra. A couple things I incorporate into upgrades: - The main task is getting the new binaries loaded, then restarting the node – in a rolling fashion. Get

RE: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Durity, Sean R
I think you are asking about *encryption* at rest. To my knowledge, open source Cassandra does not support this natively. There are options, like encrypting the data in the application before it gets to Cassandra. Some companies offer other solutions. IMO, if you need the increased security, it

RE: [EXTERNAL] Re: Multiple cluster for a single application

2018-11-08 Thread Durity, Sean R
We have a cluster over 100 nodes that performs just fine for its use case. In our case, we needed the disk space and did not want the admin headache of very dense nodes. It does take more automation and process to handle a larger cluster, but those are all good things to solve anyway. But

RE: Cassandra 2.1 bootstrap - No streaming progress from one node

2018-11-07 Thread Durity, Sean R
I would wipe the new node and bootstrap again. I do not know of any way to resume the streaming that was previously in progress. Sean Durity From: Steinmaurer, Thomas Sent: Wednesday, November 07, 2018 5:13 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Cassandra 2.1 bootstrap - No

RE: [EXTERNAL] Re: rolling version upgrade, upgradesstables, and vulnerability window

2018-10-30 Thread Durity, Sean R
Just to pile on: I agree. On our upgrades, I always aim to get the binary part done on all nodes before worrying about upgradesstables. Upgrade is one node at a time (precautionary). Upgradesstables depends on cluster size, data size, compactionthroughput, etc. I usually start with running

RE: [EXTERNAL] Re: [E] Re: nodetool status and node maintenance

2018-10-29 Thread Durity, Sean R
I have wrapped nodetool info into my own script that strips out and interprets the information I care about. That script also sets a return code based on the health of that node (which protocols are up, etc.). Then I can monitor the individual health of the node – as that node sees itself. I

RE: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS)

2018-10-23 Thread Durity, Sean R
Agreed. I have run clusters with both RHEL5 and RHEL6 nodes. Sean Durity From: Jeff Jirsa Sent: Sunday, October 14, 2018 12:40 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS) Should be fine, just get the java and

RE: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Durity, Sean R
Thank you. I do want to hear about future conferences. I would also love to hear reports/summaries/highlights from folks who went to Distributed Data Summit (or other conferences). I think user conferences are great! Sean Durity From: Max C. Sent: Friday, October 05, 2018 8:33 PM To:

RE: [EXTERNAL] Re: Rolling back Cassandra upgrades (tarball)

2018-10-01 Thread Durity, Sean R
Version choices aside, I am an advocate for forward-only (in most cases). Here is my reasoning, so that you can evaluate for your situation: - upgrades are done while the application is up and live and writing data (no app downtime) - the upgrade usually includes a change to the sstable version

RE: [EXTERNAL] Re: Adding datacenter and data verification

2018-09-18 Thread Durity, Sean R
You are correct that altering the keyspace replication settings does not actually move any data. It only affects new writes or reads. System_auth is one that needs to be repaired quickly OR, if your number of users/permissions is relatively small, you can just reinsert them after the alter to

RE: [EXTERNAL] Re: cold vs hot data

2018-09-18 Thread Durity, Sean R
The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a

RE: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread Durity, Sean R
An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a longer (or no) TTL Then your hot data is always in the same table, but being expired. And you can access the archive table only for the more rare circumstances. Then you could have the HOT table on a

RE: [EXTERNAL] Regarding migrating data from Oracle to Cassandra.migrate data from Oracle to Cassandra.

2018-09-05 Thread Durity, Sean R
3 starting points: - DO NOT migrate your tables as they are in Oracle to Cassandra. In most cases, you need a different model for Cassandra - DO take the (free) DataStax Academy courses to learn much more about Cassandra as you dive in. It is a systematic and bite-size

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-04 Thread Durity, Sean R
I would only run the clean-up (on all nodes) after all new nodes are added. I would also look at increasing RF to 3 (and running repair) once there are plenty of nodes. (This is assuming that availability matters and that your queries use QUORUM or LOCAL_QUORUM for consistency level. Longer

RE: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread Durity, Sean R
If you are going to compare vs commercial offerings like Scylla and CosmosDB, you should be looking at DataStax Enterprise. They are moving more quickly than open source (IMO) on adding features and tools that enterprises really need. I think they have some emerging tech for large/dense nodes,

RE: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

2018-08-29 Thread Durity, Sean R
Sstableloader, though, could require a lot more disk space – until compaction can reduce. For example, if your RF=3, you will essentially be loading 3 copies of the data. Then it will get replicated 3 more times as it is being loaded. Thus, you could need up to 9x disk space. Sean Durity

RE: [EXTERNAL] Re: Improve data load performance

2018-08-15 Thread Durity, Sean R
Might also help to know: Size of cluster How much data is being loaded (# of inserts/actual data size) Single table or multiple tables? Is this a one-time or occasional load or more frequently? Is the data located in the same physical data center as the cluster? (any network latency?) On the

RE: [EXTERNAL] Re: Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-13 Thread Durity, Sean R
I have definitely seen corruption, especially in system tables, when there are multiple instances of Cassandra running/trying to start. We had an internal tool that was supposed to restart processes (like Cassandra) if they were down, but it often re-checked before Cassandra was fully up and

RE: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassandra

2018-08-09 Thread Durity, Sean R
DataStax Enterprise 6.0 has a new bulk loader tool. DSE is a commercial product, but maybe your needs are worth the investigation. Sean Durity From: Rahul Singh Sent: Tuesday, August 07, 2018 9:37 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to

RE: [EXTERNAL] Re: Cassandra rate dropping over long term test

2018-08-03 Thread Durity, Sean R
I wonder if you are building up tombstones with the deletes. Can you share your data model? Are the deleted rows using the same partition key as new rows? Any warnings in your system.log for reading through too many tombstones? Sean Durity From: Mihai Stanescu Sent: Friday, August 03, 2018

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Durity, Sean R
That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) solution. I think that would be the fastest path to success. Sean Durity From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns I need

RE: [EXTERNAL] Server kernal Parameters for cassandra

2018-07-30 Thread Durity, Sean R
Here are some to review and test for Cassandra 3.x from DataStax: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html Al Tobey has done extensive work in this area, too. This is dated (Cassandra 2.1), but is worth mining for information:

RE: [EXTERNAL] optimization to cassandra-env.sh

2018-07-26 Thread Durity, Sean R
This is a very good explanation of CMS tuning for Cassandra: http://thelastpickle.com/blog/2018/04/11/gc-tuning.html (author Jon Haddad has extensive Cassandra experience – a super star in our field) Sean Durity From: Durity, Sean R Sent: Thursday, July 26, 2018 2:08 PM To: user

RE: [EXTERNAL] optimization to cassandra-env.sh

2018-07-26 Thread Durity, Sean R
Check the archives for CMS or G1 (whichever garbage collector you are using). There has been significant and good advice on both. In general, though, G1 has one basic number to set and does very well in our use cases. CMS has lots of black art/science tuning and configuration, but you can test

RE: [EXTERNAL] Re: Cassandra recommended server uptime?

2018-07-17 Thread Durity, Sean R
We do not have any scheduled, periodic node restarts. I have been working on Cassandra across many versions, and I have not seen a case where periodic restarts would solve any problem that I saw. There are certainly times when a node needs a restart – but those are because of specific reasons.

RE: [EXTERNAL] New cluster vs Increasing nodes to already existed cluster

2018-07-16 Thread Durity, Sean R
In most cases, we separate clusters by application. This does help with isolating problems. A bad query in one application won’t affect other applications. Also, you can then scale each cluster as required by the data demands. You can also upgrade separately, which may be a huge help. You only

RE: [EXTERNAL] Re: JVM Heap erratic

2018-07-03 Thread Durity, Sean R
THIS! A well-reasoned and clear explanation of a very difficult topic. This is the kind of gold that a user mailing list can provide. Thank you, Alain! Sean Durity From: Alain RODRIGUEZ Sent: Tuesday, July 03, 2018 6:37 AM To: user cassandra.apache.org Subject: [EXTERNAL] Re: JVM Heap

RE: [EXTERNAL] Re: consultant recommendations

2018-06-29 Thread Durity, Sean R
I haven’t ever hired a Cassandra consultant, but the company named The Last Pickle (yes, an odd name) has some outstanding Cassandra experts. Not sure how they work, but worth a mention here. Nothing against Instacluster. There are great folks there, too. Sean Durity From: Evelyn Smith

RE: RE: [EXTERNAL] Cluster is unbalanced

2018-06-19 Thread Durity, Sean R
15839280 On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like acros

RE: [EXTERNAL] Re: Tombstone

2018-06-19 Thread Durity, Sean R
This sounds like a queue pattern, which is typically an anti-pattern for Cassandra. I would say that it is very difficult to get the access patterns, tombstones, and everything else lined up properly to solve a queue problem. Sean Durity From: Abhishek Singh Sent: Tuesday, June 19, 2018

RE: [EXTERNAL] Cluster is unbalanced

2018-06-18 Thread Durity, Sean R
Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)? Sean Durity lord of the (C*) rings (Staff Systems

RE: [EXTERNAL] Re: apache-cassandra 2.2.8 rpm

2018-06-11 Thread Durity, Sean R
>Finally can I run mixed Datastax and Apache nodes in the same cluster same >version? >Thank you for all your help. I have run DSE and Apache Cassandra in the same cluster while migrating to DSE. The versions of Cassandra were the same. It was relatively brief -- just during the upgrade

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread Durity, Sean R
One of the columns you are selecting is a list or map or other kind of collection. You can’t do that with an IN clause against a clustering column. Either don’t select the collection column OR don’t use the IN clause. Cassandra is trying to protect itself (and you) from a query that won’t scale

RE: [EXTERNAL] Re: Error after 3.1.0 to 3.11.2 upgrade

2018-05-14 Thread Durity, Sean R
A couple additional things: - Make sure that you ran repair on the system_auth keyspace on all nodes after changing the RF - If you are not often changing roles/permissions, you might look to increase permissions_validity_in_ms and roles_validity_in_ms so they are not being

RE: [EXTERNAL] Cassandra limitations

2018-05-04 Thread Durity, Sean R
The issue is more with the number of tables, not the number of keyspaces. Because each table has a memTable, there is a practical limit to the number of memtables that a node can hold in its memory. (And scaling out doesn’t help, because every node still has a memTable for every table.) The

RE: [EXTERNAL] Re: Cassandra reaper

2018-04-26 Thread Durity, Sean R
Wait, isn’t this the Apache Cassandra mailing list? Shouldn’t this be on the pickle users list or something? (Just kidding, everyone. I think there should be room for reaper and DataStax inquiries here.) Sean Durity From: Joaquin Casares [mailto:joaq...@thelastpickle.com] Sent: Tuesday,

RE: [EXTERNAL] Re: How to configure Cassandra to NOT use SSLv2?

2018-04-24 Thread Durity, Sean R
I think I would start with the JVM. Sometimes, for export purposes, the cryptography extensions (JCE), are in a separate jar or package from the standard JRE or JVM. I haven’t used the IBM JDK, so I don’t know specifically about that one. Also, perhaps the error is correct – SSLv2Hello is not

RE: [EXTERNAL] Re: Cassandra downgrade version

2018-04-19 Thread Durity, Sean R
This answer surprises me, because I would expect NOT to be able to downgrade if there are any changes in the sstable structure. I assume: - Upgrade is done while the application is up and writing data (so any new data is written in the new format) - Any compactions that

RE: [EXTERNAL] Cassandra vs MySQL

2018-03-20 Thread Durity, Sean R
I’m not sure there is a fair comparison. MySQL and Cassandra have different ways of solving related (but not necessarily the same) problems of storing and retrieving data. The data model between MySQL and Cassandra is likely to be very different. The key for Cassandra is that you need to model

RE: [EXTERNAL] RE: What versions should the documentation support now?

2018-03-14 Thread Durity, Sean R
The DataStax documentation is far superior to the Apache Cassandra attempts. Apache is just poor with holes all over, goofy examples, etc. It would take a team of people working full time to try and catch up with DataStax. I have met the DataStax team; they are doing good work. I think it would

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Durity, Sean R
You cannot migrate and upgrade at the same time across major versions. Streaming is (usually) not compatible between versions. As to the migration question, I would expect that you may need to put the external-facing ip addresses in several places in the cassandra.yaml file. And, yes, it would

RE: [EXTERNAL] Re: Version Rollback

2018-02-28 Thread Durity, Sean R
My short answer is always – there are no rollbacks, we only go forward. Jeff’s answer is much more complete and technically precise. You *could* rollback a few nodes (depending on topology) by just replacing them as if they had died. I always upgrade all nodes (the binaries) as quickly as

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Durity, Sean R
It is instructive to listen to the concerns of new and existing users in order to improve a product like Cassandra, but I think the school yard taunt model isn’t the most effective. In my experience with open and closed source databases, there are always things that could be improved. Many

RE: [EXTERNAL] Re: Even after the drop table, the data actually was not erased.

2018-01-17 Thread Durity, Sean R
We have found it very useful to set up an infrastructure where we can execute a nodetool command (or any other arbitrary command) from a single (non-Cassandra) host that will get executed on each node across the cluster (or a list of nodes). Sean Durity From: Alain RODRIGUEZ

RE: [EXTERNAL] Cassandra cluster add new node slowly

2018-01-03 Thread Durity, Sean R
You don't mention the version, but here are some general suggestions - 2 GB heap is very small for a node, especially with 1 TB+ of data. What is the physical RAM on the host? In general, you want ½ of physical RAM for the JVM. (Look in jvm.options or cassandra-env.sh) - You

RE: [EXTERNAL] 3.0.15 or 3.11.1

2018-01-02 Thread Durity, Sean R
It might help if you let us know about which 3.11 features you are interested. As I hear it, some of the features may not be PR ready (like materialized views). In my opinion, it seems that 3.0.15 is the more stable way to go. However, I have not been testing 3.11, so my thoughts are more based

RE: [EXTERNAL] Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-28 Thread Durity, Sean R
DataStax Enterprise (pay to license) has embedded SOLR search with Cassandra if you don’t want to move the data to another cluster for indexing/searching. Similar to Cassandra modeling, you will need to understand the exact search queries in order to build the SOLR schema to support them. The

RE: [EXTERNAL] Add nodes change

2017-12-28 Thread Durity, Sean R
--> See inline Hello All, We are going add 2 new nodes to our production server, there are 2 questions would like to have some advices? 1. In current production env, the cassandra version is 3.0.4, is it ok if we use 3.0.15 for the new node? --> I would not do this. Streaming between

RE: [EXTERNAL] Lots of simultaneous connections?

2017-12-28 Thread Durity, Sean R
Have you determined if a specific query is the one getting timed out? It is possible that the query/data model does not scale well, especially if you are trying to do something like a full table scan. It is also possible that your OS settings will limit the number of connections to the host.

RE: [EXTERNAL] Bring 2 nodes down

2017-12-28 Thread Durity, Sean R
Decommission the two nodes, one at a time (assumes you have enough disk space on the remaining hosts). That will move the data to the remaining nodes and keep RF=3. Then fix the host. Then add the hosts back into the cluster, one at a time. This is easier with vnodes. Finally, run clean-up on

RE: [EXTERNAL] Re: Any Cassandra Backup and Restore tool like Cassandra Reaper?

2017-12-27 Thread Durity, Sean R
Datos IO solves many of the problems inherent in Cassandra backups (primarily issues with acceptable restores). It is worth considering. Other groups in my company are happy with it. Sean Durity From: Lerh Chuan Low [mailto:l...@instaclustr.com] Sent: Thursday, December 14, 2017 4:13 PM To:

RE: [EXTERNAL] Re: Data Node Density

2017-12-27 Thread Durity, Sean R
You asked for experience; here’s mine. I support one PR cluster where the hardware was built more for HBase than Cassandra. So the data capacity is large (4.5 TB/node). Administratively, it is the worst cluster to work on because any kind of repairs, streaming, replacement take forever. And

RE: [EXTERNAL] Re: Upgrade using rebuild

2017-12-27 Thread Durity, Sean R
The sstable formats/versions are different. Streaming uses those formats. Streaming doesn’t work across major versions (for sure), and I don’t even try it across minor versions. To ensure Cassandra-happiness, follow the rule: For streaming operations (adding nodes, rebuild, repairs, etc.) have

RE: Pending-range-calculator during bootstrapping

2017-09-22 Thread Durity, Sean R
I don't know a specific issue with these versions, but in general you do not want to do ANY streaming operations (bootstrap or repair) between Cassandra versions. I would get all the nodes (in all DCs) to the same version and then try the bootstrap. Sean Durity From: Peng Xiao

RE: Pending-range-calculator during bootstrapping

2017-09-22 Thread Durity, Sean R
I don't know a specific issue with these versions, but in general you do not want to do ANY streaming operations (bootstrap or repair) between Cassandra versions. I would get all the nodes (in all DCs) to the same version and then try the bootstrap. Sean Durity From: Peng Xiao

RE: Massive deletes -> major compaction?

2017-09-22 Thread Durity, Sean R
-defined compaction on each sstable in reverse generational order (oldest first) and as long as the data is minimally overlapping it’ll purge tombstones that way as well - takes longer but much less disk involved. -- Jeff Jirsa On Sep 21, 2017, at 11:27 AM, Durity, Sean R <sean_r_dur...@h

RE: Massive deletes -> major compaction?

2017-09-21 Thread Durity, Sean R
ved. -- Jeff Jirsa On Sep 21, 2017, at 11:27 AM, Durity, Sean R <sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote: Cassandra version 2.0.17 (yes, it’s old – waiting for new hardware/new OS to upgrade) In a long-running system with billions of rows, TTL was not set

Massive deletes -> major compaction?

2017-09-21 Thread Durity, Sean R
Cassandra version 2.0.17 (yes, it's old - waiting for new hardware/new OS to upgrade) In a long-running system with billions of rows, TTL was not set. So a one-time purge is being planned to reduce disk usage. Records older than a certain date will be deleted. The table uses size-tiered

RE: Can I have multiple datacenter with different versions of Cassandra

2017-09-12 Thread Durity, Sean R
No – the general answer is that you cannot stream between major versions of Cassandra. I would upgrade the existing ring, then add the new DC. Sean Durity From: Chuck Reynolds [mailto:creyno...@ancestry.com] Sent: Thursday, May 18, 2017 11:20 AM To: user@cassandra.apache.org Subject: Can I

RE: Reg:- DSE 5.1.0 Issue

2017-09-12 Thread Durity, Sean R
In an attempt to help close the loop for future readers… I don’t think an upgrade from DSE 4.8 straight to 5.1 is supported. I think you have to go through 5.0.x first. And, yes, you should contact DataStax support for help, but I’m ok with DSE-related questions. They may be more

RE: AWS Cassandra backup/Restore tools

2017-09-12 Thread Durity, Sean R
Datos IO has a backup/restore product for Cassandra that another team here has used successfully. It solves many of the problems inherent with sstable captures. Without something like it, restores are a nightmare with any volume of data. The downtime required and the loss of data since the

RE: Getting all unique keys

2017-08-23 Thread Durity, Sean R
DataStax Enterprise bundles spark and spark connector on the DSE nodes and handles much of the plumbing work (and monitoring, etc.). Worth a look. Sean Durity From: Avi Levi [mailto:a...@indeni.com] Sent: Tuesday, August 22, 2017 2:46 AM To: user@cassandra.apache.org Subject: Re: Getting all

RE: Adding a new node with the double of disk space

2017-08-18 Thread Durity, Sean R
I am doing some on-the-job-learning on this newer feature of the 3.x line, where the token generation algorithm will compensate for different size nodes in a cluster. In fact, it is one of the main reasons I upgraded to 3.0.13, because I have a number of original nodes in a cluster that are

RE: nodetool removenode causing the schema out of sync

2017-07-13 Thread Durity, Sean R
Late to this party, but Jeff is talking about nodetool setstreamthroughput. The default in most versions is 200 Mb/s (set in yaml file as stream_throughput_outbound_megabits_per_sec). This is outbound throttle only. So, if streams from multiple nodes are going to one, it can get inundated. The

RE: Node failure Due To Very high GC pause time

2017-07-13 Thread Durity, Sean R
I like Bryan’s terminology of an “antagonistic use case.” If I am reading this correctly, you are putting 5 (or 10) million records in a partition and then trying to delete them in the same order they are stored. This is not a good data model for Cassandra, in fact a dangerous data model. That

RE: READ Queries timing out.

2017-07-07 Thread Durity, Sean R
1 GB heap is very small. Why not try increasing it to 50% of RAM and see if it helps you track down the real issue. It is hard to tune around a bad data model, if that is indeed the issue. Seeing your tables and queries would help. Sean Durity From: Pranay akula

RE: Starting Cassandrs after restore of Data - get error

2017-07-07 Thread Durity, Sean R
I have seen Windows format cause problems. Run dos2unix on the cassandra.yaml file (on the linux box) and see if it helps. Sean Durity lord of the (C*) rings (Staff Systems Engineer - Cassandra) MTC 2250 #cassandra - for the latest news and updates From: Jonathan Baynes

RE: cassandra OOM

2017-04-25 Thread Durity, Sean R
We have seen much better stability (and MUCH less GC pauses) from G1 with a variety of heap sizes. I don’t even consider CMS any more. Sean Durity From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com] Sent: Tuesday, April 04, 2017 5:34 PM To: user@cassandra.apache.org Subject: Re: cassandra OOM

RE: Can we get username and timestamp in cqlsh_history?

2017-04-03 Thread Durity, Sean R
Sounds like you want full auditing of CQL in the cluster. I have not seen anything built into the open source version for that (but I could be missing something). DataStax Enterprise does have an auditing feature. Sean Durity From: anuja jain [mailto:anujaja...@gmail.com] Sent: Wednesday,

RE: Issue with Cassandra consistency in results

2017-03-29 Thread Durity, Sean R
There have been many instances of supposed inconsistency noted on this list if nodes do not have the same system time. Make sure you have a matching clock on all nodes (ntp or similar). Sean Durity From: Shubham Jaju [mailto:shub...@vassarlabs.com] Sent: Tuesday, March 21, 2017 9:58 PM To:

RE: results differ on two queries, based on secondary index key and partition key

2017-03-29 Thread Durity, Sean R
This looks more like a problem for a graph-based model. Have you looked at DSE Graph as a possibility? Sean Durity From: ferit baver elhuseyni [mailto:feritba...@gmail.com] Sent: Tuesday, March 14, 2017 11:40 AM To: user@cassandra.apache.org Subject: results differ on two queries, based on

<    1   2