Re: How do u setup networking for Opening Solr Web Interface when on cloud?
I have searched on internet but did not get any link which worked for me. Even on https://s3.amazonaws.com/quickstart-reference/datastax/latest/doc/datastax-enterprise-on-the-aws-cloud.pdf it is mentioned to use SSH tunneling . "DSE nodes have no public IP addresses. Access to the web consoles for Solr or Spark can be established by using an SSH tunnel. For example, you can access the Solr console from http://NODE_IP:8983/solr/. You can bind to a local port with a command like the following (replacing the key and IP values for those of your cluster): ssh -v -i $KEY_FILE -L 8983:$NODE_IP:8983 ubuntu@$OPSC_PUBLIC_IP -N The Solr console is then accessible at http://127.0.0.1:8983/solr/. When you’re prompted to log in, enter the user name cassandra and the password you chose. " But i am not looking for SSH tunneling option. I tried to follow below link as well: https://forums.aws.amazon.com/thread.jspa?threadID=31406 But DSE nodes have no public IP addresses so this also did not work. Thanks On Mon, Apr 1, 2019 at 12:32 PM Rahul Singh wrote: > This is probably not a question for this community... but rather for > Datastax support or the Datastax Academy slack group. More specifically > this is a "how to expose solr securely" question which is amply answered > well on the interwebs if you look for it on Google. > > > rahul.xavier.si...@gmail.com > > http://cassandra.link > > I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra > conference, and I want to see you there! Use my code Singh50 for 50% off > your registration. www.datastax.com/accelerate > > > On Mon, Apr 1, 2019 at 12:19 PM Krish Donald wrote: > >> Hi, >> >> We have DSE cassandra cluster running on AWS. >> Now we have requirement to enable Solr and Spark on the cluster. >> We have cassandra on private data subnet which has connectivity to app >> layer. >> From cassandra , we cant open direct Solr Web interface. >> We tried using SSH tunneling and it is working but we cant give SSH >> tunneling option to developers. >> >> We would like to create a Load Balancer and put the cassandra nodes >> under that load balancer but the question here is , what health check i >> need to give for load balancer so that it can open the Solr Web UI ? >> >> My solution might not be perfect, please suggest any other solution if >> you have ? >> >> Thanks >> >>
Re: Five Questions for Cassandra Users
Answers inline. 1. Do the same people where you work operate the cluster and write the code to develop the application? No but the operators need to know development , data-modeling, and generally how to "code" the application. (Coding is a low-level task of assigning a code to a concept.. so I don't think that's the proper verb in these scenarios.. engineering, or software development, or even programing is a better term). It's because the developers are hired dime a dozen at the B / C level and then replaced by D /E / F level developers as things go on.. so the Data team eventually ends up being the expert of the application and the data platform, and a "Center of Excellence" for the development / architects to work with on a collaborative basis. 2. Do you have a metrics stack that allows you to see graphs of various metrics with all the nodes displayed together? Yes. OpsCenter, ELK, Grafana, custom node data visualizers in excel (because lines and charts don't tell you everything) 3. Do you have a log stack that allows you to see the logs for all the nodes together? ELK. CloudWatch 4. Do you regularly repair your clusters - such as by using Reaper? Depends. Cron, Reaper, OpsCenter Repair, and now NodeSync 5. Do you use artificial intelligence to help manage your clusters? Yes, I actually have made an artificial general intelligence called Gravitron. It learns by ingesting all the news articles I aggregate about Cassandra and the links I curate on cassandra.link into a solr/lucene index and then using clustering find out the most popular and popularly connected content. Once it does that there's a summarization of the content into human readable content as well as interpreted bash code that gets pushed into a "Recipe Book." As the master operator identifies scenarios using english language, and then runs the bash commands, the machine slowly but surely "wakes up" and starts to manage itself. It can also play Go , the game, and beat IBM's AlphaGo at Go, and Donald Trump at golf while he was cheating! rahul.xavier.si...@gmail.com http://cassandra.link I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra conference, and I want to see you there! Use my code Singh50 for 50% off your registration. www.datastax.com/accelerate Happy april fools day. On Thu, Mar 28, 2019 at 5:03 AM Kenneth Brotman wrote: > I’m looking to get a better feel for how people use Cassandra in > practice. I thought others would benefit as well so may I ask you the > following five questions: > > > > 1. Do the same people where you work operate the cluster and write > the code to develop the application? > > > > 2. Do you have a metrics stack that allows you to see graphs of > various metrics with all the nodes displayed together? > > > > 3. Do you have a log stack that allows you to see the logs for all > the nodes together? > > > > 4. Do you regularly repair your clusters - such as by using Reaper? > > > > 5. Do you use artificial intelligence to help manage your clusters? > > > > > > Thank you for taking your time to share this information! > > > > Kenneth Brotman >
Re: How do u setup networking for Opening Solr Web Interface when on cloud?
This is probably not a question for this community... but rather for Datastax support or the Datastax Academy slack group. More specifically this is a "how to expose solr securely" question which is amply answered well on the interwebs if you look for it on Google. rahul.xavier.si...@gmail.com http://cassandra.link I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra conference, and I want to see you there! Use my code Singh50 for 50% off your registration. www.datastax.com/accelerate On Mon, Apr 1, 2019 at 12:19 PM Krish Donald wrote: > Hi, > > We have DSE cassandra cluster running on AWS. > Now we have requirement to enable Solr and Spark on the cluster. > We have cassandra on private data subnet which has connectivity to app > layer. > From cassandra , we cant open direct Solr Web interface. > We tried using SSH tunneling and it is working but we cant give SSH > tunneling option to developers. > > We would like to create a Load Balancer and put the cassandra nodes under > that load balancer but the question here is , what health check i need to > give for load balancer so that it can open the Solr Web UI ? > > My solution might not be perfect, please suggest any other solution if you > have ? > > Thanks > >
Re: Best practices while designing backup storage system for big Cassandra cluster
At my current job I had to roll my own backup system. Hopefully I can get it OSS'd at some point. Here is a (now slightly outdated) presentation: https://docs.google.com/presentation/d/13Aps-IlQPYAa_V34ocR0E8Q4C8W2YZ6Jn5_BYGrjqFk/edit#slide=id.p If you are struggling with the disk I/O cost of the sstable backups/copies, note that since sstables are append-only, if you adopt an incremental approach to your backups, you only need to track a list of the current files and upload the files that are new compared to a previous successful backup. Your "manifest" of files for a node will need to have references to the previous backup, and you'll wnat to "reset" with a full backup each month. I stole that idea from https://github.com/tbarbugli/cassandra_snapshotter. I would have used that but we had more complex node access modes (kubernetes, ssh through jumphosts, etc) and lots of other features needed that weren't supported. In AWS I use aws profiles to throttle the transfers, and parallelize across nodes. The basic unit of a successful backup is a single node, but you'll obviously want to track overall node success. Note that in rack-based topologies you really only need one whole successful rack if your RF is > # racks, and one DC. Beware doing simultaneous flushes/snapshots across the cluster at once, that might be the equivalent of a DDos. You might want to do a "jittered" randomized preflush of the cluster first before doing the snapshotting. Unfortunately, the nature of a distributed system is that snapshotting all the nodes at the precise same time is a hard problem. I also do not / have not used the built-in incremental backup feature of cassandra, which can enable more precise point-in-time backups (aside from the unflushed data in the commitlogs) A note on incrementals with occaisional FULLs: Note that FULL backups monthly might take more than a day or two, especially throttled. My incrementals were originally looking up previous manifests using only 'most recent", but then the long-running FULL backups were excluded from the "chain" of incremental backups. So I now implement a fuzzy lookup for the incrementals that prioritizes any FULL in the last 5 days over any more recent incremental. Thus you can purge old backups you don't need more safely using the monthly full backups as a reset point. On Mon, Apr 1, 2019 at 1:08 PM Alain RODRIGUEZ wrote: > Hello Manish, > > I think any disk works. As long as it is big enough. It's also better if > it's a reliable system (some kind of redundant raid, NAS, storage like GCS > or S3...). We are not looking for speed mostly during a backup, but > resiliency and not harming the source cluster mostly I would say. > Then how fast you write to the backup storage system will probably be more > often limited by what you can read from the source cluster. > The backups have to be taken from running nodes, thus it's easy to > overload the disk (reads), network (export backup data to final > destination), and even CPU (as/if the machine handles the transfer). > > What are the best practices while designing backup storage system for big >> Cassandra cluster? > > > What is nice to have (not to say mandatory) is a system of incremental > backups. You should not take the data from the nodes every time, or you'll > either harm the cluster regularly OR spend days to transfer the data (if > the amount of data grows big enough). > I'm not speaking about Cassandra incremental snapshots, but of using > something like AWS Snapshot, or copying this behaviour programmatically to > take (copy, link?) old SSTables from previous backups when they exist, will > greatly unload the clusters work and the resource needed as soon enough a > substantial amount of the data should be coming from the backup data source > itself. The problem with incremental snapshot is that when restoring, you > have to restore multiple pieces, making it harder and involving a lot of > compaction work. > The "caching" technic mentioned above gives the best of the 2 worlds: > - You will always backup from the nodes only the sstables you don’t have > already in your backup storage system, > - You will always restore easily as each backup is a full backup. > > It's not really a "hands-on" writing, but this should let you know about > existing ways to do backups and the tradeoffs, I wrote this a year ago: > http://thelastpickle.com/blog/2018/04/03/cassandra-backup-and-restore-aws-ebs.html > . > > It's a complex topic, I hope some of this is helpful to you. > > C*heers, > --- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > Le jeu. 28 mars 2019 à 11:24, manish khandelwal < > manishkhandelwa...@gmail.com> a écrit : > >> Hi >> >> >> >> I would like to know is there any guideline for selecting storage device >> (disk type) for Cassandra backups. >> >> >> >> As per my current observation, NearLine
Re: Best practices while designing backup storage system for big Cassandra cluster
Hello Manish, I think any disk works. As long as it is big enough. It's also better if it's a reliable system (some kind of redundant raid, NAS, storage like GCS or S3...). We are not looking for speed mostly during a backup, but resiliency and not harming the source cluster mostly I would say. Then how fast you write to the backup storage system will probably be more often limited by what you can read from the source cluster. The backups have to be taken from running nodes, thus it's easy to overload the disk (reads), network (export backup data to final destination), and even CPU (as/if the machine handles the transfer). What are the best practices while designing backup storage system for big > Cassandra cluster? What is nice to have (not to say mandatory) is a system of incremental backups. You should not take the data from the nodes every time, or you'll either harm the cluster regularly OR spend days to transfer the data (if the amount of data grows big enough). I'm not speaking about Cassandra incremental snapshots, but of using something like AWS Snapshot, or copying this behaviour programmatically to take (copy, link?) old SSTables from previous backups when they exist, will greatly unload the clusters work and the resource needed as soon enough a substantial amount of the data should be coming from the backup data source itself. The problem with incremental snapshot is that when restoring, you have to restore multiple pieces, making it harder and involving a lot of compaction work. The "caching" technic mentioned above gives the best of the 2 worlds: - You will always backup from the nodes only the sstables you don’t have already in your backup storage system, - You will always restore easily as each backup is a full backup. It's not really a "hands-on" writing, but this should let you know about existing ways to do backups and the tradeoffs, I wrote this a year ago: http://thelastpickle.com/blog/2018/04/03/cassandra-backup-and-restore-aws-ebs.html . It's a complex topic, I hope some of this is helpful to you. C*heers, --- Alain Rodriguez - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le jeu. 28 mars 2019 à 11:24, manish khandelwal < manishkhandelwa...@gmail.com> a écrit : > Hi > > > > I would like to know is there any guideline for selecting storage device > (disk type) for Cassandra backups. > > > > As per my current observation, NearLine (NL) disk on SAN slows down > significantly while copying backup files (taking full backup) from all node > simultaneously. Will using SSD disk on SAN help us in this regard? > > Apart from using SSD disk, what are the alternative approach to make my > backup process fast? > > What are the best practices while designing backup storage system for big > Cassandra cluster? > > > Regards > > Manish >
How do u setup networking for Opening Solr Web Interface when on cloud?
Hi, We have DSE cassandra cluster running on AWS. Now we have requirement to enable Solr and Spark on the cluster. We have cassandra on private data subnet which has connectivity to app layer. >From cassandra , we cant open direct Solr Web interface. We tried using SSH tunneling and it is working but we cant give SSH tunneling option to developers. We would like to create a Load Balancer and put the cassandra nodes under that load balancer but the question here is , what health check i need to give for load balancer so that it can open the Solr Web UI ? My solution might not be perfect, please suggest any other solution if you have ? Thanks