RE: Migrating Cassandra to New Nodes
What you have described below should work just fine. When I was replacing nodes in my ring, I ended up creating a new datacenter with the new nodes, but I was upgrading to vnodes too at the time. -Arindam From: nash [mailto:nas...@gmail.com] Sent: Monday, April 28, 2014 10:52 PM To: user@cassandra.apache.org Subject: Migrating Cassandra to New Nodes I have a new set of nodes and I'd like to migrate my entire cluster onto them without any downtime. I believe that I can launch the new cluster and have them join the ring and then use nodetool to decommission the old nodes one at a time. But, I'm wondering what is the safest way to update the seeds in the cassandra.yaml files? AFAICT, there is nothing particularly special about the choices of seeds? So, prior to starting decom, I was figuring I could update all the seeds to some subset of the new cluster. Is that reliable? TIA, --nash
Re: Load balancing issue with virtual nodes
Thanks you Ben for the links On Tue, Apr 29, 2014 at 3:40 AM, Ben Bromhead b...@instaclustr.com wrote: Some imbalance is expected and considered normal: See http://wiki.apache.org/cassandra/VirtualNodes/Balance As well as https://issues.apache.org/jira/browse/CASSANDRA-7032 Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr | +61 415 936 359 On 29 Apr 2014, at 7:30 am, DuyHai Doan doanduy...@gmail.com wrote: Hello all Some update about the issue. After wiping completely all sstable/commitlog/saved_caches folder and restart the cluster from scratch, we still experience weird figures. After the restart, nodetool status does not show an exact balance of 50% of data for each node : Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN host1 48.57 KB 256 *51.6%* d00de0d1-836f-4658-af64-3a12c00f47d6 rack1 UN host2 48.57 KB 256 *48.4%* e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1 As you can see, the % is very close to 50% but not exactly 50% What can explain that ? Can it be network connection issue during token initial shuffle phase ? P.S: both host1 and host2 are supposed to have exactly the same hardware Regards Duy Hai DOAN On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan batra...@yahoo.comwrote: I don't know about hector but the datastax java driver needs just one ip from the cluster and it will discover the rest of the nodes. Then by default it will do a round robin when sending requests. So if Hector does the same the patterb will againg appear. Did you look at the size of the dirs? That documentation is for C* 0.8. It's old. But depending on your boxes you might reach CPU bottleneck. Might want to google for write path in cassandra.. According to that, there is not much to do when writes come in... On Friday, April 25, 2014 12:00 AM, DuyHai Doan doanduy...@gmail.com wrote: I did some experiments. Let's say we have node1 and node2 First, I configured Hector with node1 node2 as hosts and I saw that only node1 has high CPU load To eliminate the client connection issue, I re-test with only node2 provided as host for Hector. Same pattern. CPU load is above 50% on node1 and below 10% on node2. It means that node2 is playing as coordinator and forward many write/read request to node1 Why did I look at CPU load and not iostat al ? Because I have a very intensive write work load with read-only-once pattern. I've read here ( http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning) that heavy write in C* is more CPU bound but maybe the info may be outdated and no longer true Regards Duy Hai DOAN On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 04/24/2014 10:29 AM, DuyHai Doan wrote: Client used = Hector 1.1-4 Default Load Balancing connection policy Both nodes addresses are provided to Hector so according to its connection policy, the client should switch alternatively between both nodes OK, so is only one connection being established to one node for one bulk write operation? Or are multiple connections being made to both nodes and writes performed on both? -- Michael
Migrating from Snappy to LZ4 on C* 1.2
Hello, I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate my current Snappy compressed CF's to LZ4. Changing the schema is easy, my questions are: 1. Will previous, Snappy compressed tables still be readable? 2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do I have to run major compaction? Thanks, Katriel
Re: JDK 8
Looks like it will be like with the version 7... Cassandra has been compatible with this version for a long time, but there were no official validations and Datastax recommended during a long time (still now ?) to use Java 6. The best thing would be to use older versions. If for some reason you use Java 8, run some tests and let us know how things goes :). Good luck with this. 2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws: It seems to run ok, but I havent seen it yet in production on 8. -- *Colin Clark* +1-320-221-9531 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell mitchell.acker...@pgi.com wrote: I've been searching around, but cannot find any information as to whether Cassandra runs on JRE 8. Any information on that? Thanks, Mitchell
Re: Migrating from Snappy to LZ4 on C* 1.2
Hi, I would say: 1 - Yes 2 - Yes (No major compaction needed, upgradesstables should do the job) As always in case of doubt, as always, test it. ìn this case you can even do it using a local machine. Alain 2014-04-29 9:57 GMT+02:00 Katriel Traum katr...@google.com: Hello, I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate my current Snappy compressed CF's to LZ4. Changing the schema is easy, my questions are: 1. Will previous, Snappy compressed tables still be readable? 2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do I have to run major compaction? Thanks, Katriel
Re: Can the seeds list be changed at runtime?
Hi Boying, From Datastax documentation: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architectureGossipAbout_c.html The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes. For this reason you can change the seed list on existing node at any time, as the node itself will already be aware of the cluster and would not need to rely on the seed list to join. For new nodes that you want to bootstrap into the cluster you can specify any nodes you wish. Mark On Tue, Apr 29, 2014 at 2:57 AM, Lu, Boying boying...@emc.com wrote: Hi, All, I wonder if I can change the seeds list at runtime. i.e. without change the yaml file and restart DB service? Thanks Boying
Re: JDK 8
Datastax recommended during a long time (still now ?) to use Java 6 Java 6 is recommended for version 1.2 Java 7 is required for version 2.0 Mark On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Looks like it will be like with the version 7... Cassandra has been compatible with this version for a long time, but there were no official validations and Datastax recommended during a long time (still now ?) to use Java 6. The best thing would be to use older versions. If for some reason you use Java 8, run some tests and let us know how things goes :). Good luck with this. 2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws: It seems to run ok, but I havent seen it yet in production on 8. -- *Colin Clark* +1-320-221-9531 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell mitchell.acker...@pgi.com wrote: I’ve been searching around, but cannot find any information as to whether Cassandra runs on JRE 8. Any information on that? Thanks, Mitchell
Re: JDK 8
Thanks for the upgrade Mark. 2014-04-29 11:35 GMT+02:00 Mark Reddy mark.re...@boxever.com: Datastax recommended during a long time (still now ?) to use Java 6 Java 6 is recommended for version 1.2 Java 7 is required for version 2.0 Mark On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Looks like it will be like with the version 7... Cassandra has been compatible with this version for a long time, but there were no official validations and Datastax recommended during a long time (still now ?) to use Java 6. The best thing would be to use older versions. If for some reason you use Java 8, run some tests and let us know how things goes :). Good luck with this. 2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws: It seems to run ok, but I havent seen it yet in production on 8. -- *Colin Clark* +1-320-221-9531 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell mitchell.acker...@pgi.com wrote: I've been searching around, but cannot find any information as to whether Cassandra runs on JRE 8. Any information on that? Thanks, Mitchell
Re: row caching for frequently updated column
Hello, Iirc writing a new value to a row will invalidate the row cache for that value. Row cache is only populated after a read operation. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck Cassandra provides the ability to preheat key and page cache, but I don't believe this is possible for row cache. Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I am wondering if there is any negative impact on Cassandra write operation, if I turn on row caching for a table that has mostly 'static columns' but few frequently write columns (like timestamp). The application will frequently write to a few columns, and the application will also frequently query entire row. How Cassandra handle update column to a cached row? does it update both memtables value and also the row cached row's column(which dealing with memory update so it is very fast) ? or in order to update the cached row, entire row need to read back from sstable? thanks
Re: JDK 8
Hi, When we look at the wiki it's said : Cassandra requires the most stable version of Java 7 you can deploy, preferably the Oracle/Sun JVM. And in chapter 4 we see that they are using Cassandra 1.2 Connected to Test Cluster at localhost:9160. [cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0] Use HELP for help. In DataStax documentation concernning the installation on Debian they say : Install the latest version of Oracle Java SE Runtime Environment (JRE) 6 or 7. See Installing Oracle JRE on Debian or Ubuntu Systems. The fact that public updates stopped for java 6 since February 2013 should help to choose between those 2 versioins :) FYI, we choose to use Java 7 for 2 years and are happy with that in production ! Regards -- Cyril SCETBON On 29 Apr 2014, at 11:35, Mark Reddy mark.re...@boxever.com wrote: Datastax recommended during a long time (still now ?) to use Java 6 Java 6 is recommended for version 1.2 Java 7 is required for version 2.0
Re: Migrating from Snappy to LZ4 on C* 1.2
Thanks for the answer. I've tested it by myself now, and indeed it works. Only note I have is that you have to run nodetool upgradesstables -a, so all sstables are updated. Katriel On Tue, Apr 29, 2014 at 12:22 PM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, I would say: 1 - Yes 2 - Yes (No major compaction needed, upgradesstables should do the job) As always in case of doubt, as always, test it. ìn this case you can even do it using a local machine. Alain 2014-04-29 9:57 GMT+02:00 Katriel Traum katr...@google.com: Hello, I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate my current Snappy compressed CF's to LZ4. Changing the schema is easy, my questions are: 1. Will previous, Snappy compressed tables still be readable? 2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do I have to run major compaction? Thanks, Katriel
Re: Point in Time Recovery
Hi Rob, I know it has been a while but we managed to perform a point-in-time recovery. I am not really sure what the problem was but I guess it has to do with not reading exactly (use GMT and not local time zone, copying archivelogs to the wrong place, etc.). So everything should work as described but I think there should be a little more automation in it. Thanks all, Dennis Am 11.04.2014 21:11, schrieb Robert Coli: On Fri, Apr 11, 2014 at 1:21 AM, Dennis Schwan dennis.sch...@1und1.demailto:dennis.sch...@1und1.de wrote: The archived commitlogs are copied to the restore directory and afterwards cassandra is replaying those commitlogs but still we only see the data from the snapshot, not the commitlogs. If you turn up debug log4j settings, you should be able to see whether the replay is correctly applying mutations to memtables. Do you see a flush of memtables to sstables at the end of commitlog replay? If not, memtables are not being created by commitlog replay. =Rob -- Dennis Schwan Oracle DBA Mail Core 11 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany Phone: +49 721 91374-8738 E-Mail: dennis.sch...@1und1.demailto:dennis.sch...@1und1.de | Web: www.1und1.dehttp://www.1und1.de Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484 Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian Würst Aufsichtsratsvorsitzender: Michael Scheeren Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese Email. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This E-Mail may contain confidential and/or privileged information. If you are not the intended recipient of this E-Mail, you are hereby notified that saving, distribution or use of the content of this E-Mail in any way is prohibited. If you have received this E-Mail in error, please notify the sender and delete the E-Mail.
RE: JDK 8
Thanks everyone From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] Sent: Tuesday, April 29, 2014 3:47 AM To: user@cassandra.apache.org Subject: Re: JDK 8 Thanks for the upgrade Mark. 2014-04-29 11:35 GMT+02:00 Mark Reddy mark.re...@boxever.commailto:mark.re...@boxever.com: Datastax recommended during a long time (still now ?) to use Java 6 Java 6 is recommended for version 1.2 Java 7 is required for version 2.0 Mark On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com wrote: Looks like it will be like with the version 7... Cassandra has been compatible with this version for a long time, but there were no official validations and Datastax recommended during a long time (still now ?) to use Java 6. The best thing would be to use older versions. If for some reason you use Java 8, run some tests and let us know how things goes :). Good luck with this. 2014-04-29 1:09 GMT+02:00 Colin co...@clark.wsmailto:co...@clark.ws: It seems to run ok, but I havent seen it yet in production on 8. -- Colin Clark +1-320-221-9531tel:%2B1-320-221-9531 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell mitchell.acker...@pgi.commailto:mitchell.acker...@pgi.com wrote: I've been searching around, but cannot find any information as to whether Cassandra runs on JRE 8. Any information on that? Thanks, Mitchell
Re: row caching for frequently updated column
hi, writing a new value to a row will invalidate the row cache for that value do you mean the entire row will be invalidate ? or just the column it was being updated ? I was reading through http://planetcassandra.org/blog/post/cassandra-11-tuning-for-frequent-column-updates/ that seems to indicate it just write through it and not invalidate the entire row. if Cassandra invalidate the row cache upon a single column update to that row, that seems very inefficient. On Tue, Apr 29, 2014 at 4:43 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Iirc writing a new value to a row will invalidate the row cache for that value. Row cache is only populated after a read operation. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck Cassandra provides the ability to preheat key and page cache, but I don't believe this is possible for row cache. Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I am wondering if there is any negative impact on Cassandra write operation, if I turn on row caching for a table that has mostly 'static columns' but few frequently write columns (like timestamp). The application will frequently write to a few columns, and the application will also frequently query entire row. How Cassandra handle update column to a cached row? does it update both memtables value and also the row cached row's column(which dealing with memory update so it is very fast) ? or in order to update the cached row, entire row need to read back from sstable? thanks
Re: row caching for frequently updated column
if Cassandra invalidate the row cache upon a single column update to that row, that seems very inefficient. Yes. For the most recent direction, take a look at: https://issues.apache.org/jira/browse/CASSANDRA-5357 -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: row caching for frequently updated column
On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: if Cassandra invalidate the row cache upon a single column update to that row, that seems very inefficient. https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634 =Rob
Re: Migrating Cassandra to New Nodes
On Mon, Apr 28, 2014 at 10:52 PM, nash nas...@gmail.com wrote: I have a new set of nodes and I'd like to migrate my entire cluster onto them without any downtime. I believe that I can launch the new cluster and have them join the ring and then use nodetool to decommission the old nodes one at a time. But, I'm wondering what is the safest way to update the seeds in the cassandra.yaml files? AFAICT, there is nothing particularly special about the choices of seeds? So, prior to starting decom, I was figuring I could update all the seeds to some subset of the new cluster. Is that reliable? The fastest way to vertically scale a node is : https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/ As a minor note, you do lose any hints destined for that node while you are doing the copy, so use pre-copy techniques (rsync, then re-rsync with --delete) and then immediately repair to shorten the window of inconsistency if you read at CL.ONE. =Rob
Re: Point in Time Recovery
On Tue, Apr 29, 2014 at 7:46 AM, Dennis Schwan dennis.sch...@1und1.dewrote: I know it has been a while but we managed to perform a point-in-time recovery. I am not really sure what the problem was but I guess it has to do with not reading exactly (use GMT and not local time zone, copying archivelogs to the wrong place, etc.). Glad to hear things are working, thank you for sharing your experience back with the list community. :) =Rob
Re: Can the seeds list be changed at runtime?
On Mon, Apr 28, 2014 at 6:57 PM, Lu, Boying boying...@emc.com wrote: I wonder if I can change the seeds list at runtime. i.e. without change the yaml file and restart DB service? There are dynamic seed providers, Priam for example uses one. https://issues.apache.org/jira/browse/CASSANDRA-5836 Is a JIRA about the current confusion of the yaml based seed list and what it means to be a seed, specifically in the context of bootstrapping. There is a trivial case that illustrates why seed lists need to be dynamic : 1) 3 node cluster, A B C, RF=1. 2) A is a seed, started first. B starts second, C starts third. 3) A and B fail. C does not fail. 4) A and B now have no seed to bootstrap from. C does not consider itself a seed in its own seed list. 5) C no longer has a node it gossips to once a gossip round, which is one of the only other seed related difference. Of course in practice you can just remove A from its own seed list and put C in A's bootstrap list and bootstrap it. But really what you should do is make C the seed in a dynamic seed provider. Datastax said : The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes. Seed nodes are also gossiped to once per round, which some might argue makes them special. =Rob
Re: Cassandra data retention policy
Just a heads up--this is only available in the latest version of Cassandra 2.0.6, and is not available in Cassandra 1.2. On Mon, Apr 28, 2014 at 12:57 PM, Donald Smith donald.sm...@audiencescience.com wrote: CQL lets you specify a default TTL per column family/table: and default_time_to_live=86400 . *From:* Redmumba [mailto:redmu...@gmail.com] *Sent:* Monday, April 28, 2014 12:51 PM *To:* user@cassandra.apache.org *Subject:* Re: Cassandra data retention policy Have you looked into using a TTL? You can set this per insert (unfortunately, it can't be set per CF) and values will be tombstoned after that amount of time. I.e., INSERT INTO VALUES ... TTL 15552000 Keep in mind, after the values have expired, they will essentially become tombstones--so you will still need to run clean-ups (probably daily) to clear up space. Does this help? One caveat is that this is difficult to apply to existing rows--i.e., you can't bulk-update a bunch of rows with this data. As such, another good suggestion is to simply have a secondary index on a date field of some kind, and run a bulk remove (and subsequent clean-up) daily/weekly/whatever. On Mon, Apr 28, 2014 at 11:31 AM, Han Jia johnideal...@gmail.com wrote: Hi guys, We have a processing system that just uses the data for the past six months in Cassandra. Any suggestions on the best way to manage the old data in order to save disk space? We want to keep it as backup but it will not be used unless we need to do recovery. Thanks in advance! -John
Re: Running hadoop jobs over compressed column familes with datastatx
I was able to solve the issue. There was another layer of compression happening in the DAO that was using java.util.zip.Deflater/Inflater, along with the snappy compression defined on the CF. The solution was to extend CassandraStorage and override the getNext() method. The new implementation calls super.getNext() and inflates the Tuples where appropriate. -Marlon On Wed, Apr 23, 2014 at 1:39 PM, marlon hendred mhend...@gmail.com wrote: Hi, I'm attempting to dump a pig relation of a compressed column family. Its a single column whose value is a json blob. It's compressed via snappy compression and the value validator is BytesType. After I create the relation and dump I get garbage. Here is the describe: ColumnFamily: CF Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType Default column value validator: org.apache.cassandra.db.marshal.BytesType Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type GC grace seconds: 86400 Compaction min/max thresholds: 2/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Populate IO Cache on flush: false Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Pig stuff: rows = LOAD 'cql://Keyspace/CF' using CqlStorage(); I've tried to overwrite the schema by adding 'as (key: chararray, col1: chararray, value: chararray)' but when I dump this it still looks like its binary. Do I need to implement my own CqlStorage() here that uncompress or am I just missing something? I've done some googling but haven't seen anything on the subject. Also I am using Datastax Enterprise. 3.1. Thanks in advance! -m
Re: row caching for frequently updated column
Are these issues 'resolved' only in 2.0 or later release? What about 1.2 version? On Apr 29, 2014, at 9:40 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: if Cassandra invalidate the row cache upon a single column update to that row, that seems very inefficient. https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634 =Rob
Re: row caching for frequently updated column
On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam y2k...@gmail.com wrote: Are these issues 'resolved' only in 2.0 or later release? What about 1.2 version? As I understand it : 1.2 version has the on-heap row cache and off-heap row cache. It does not have the new partition cache. 2.0 version has only the off-heap row cache. It does not have the on-heap row cache or the new partition cache. 2.1 version has the new partition cache. In summary, you probably don't want to use any of these half-baked, immature internal row/etc. caches unless you are very, very certain that you have an ideal case for them. =Rob
Connect Cassandra rings in datacenter and ec2
Hi, We're planning to deploy 3 cassandra rings, one in our datacenter (with more node/power) and two others in EC2. We don't have enough public IP to assign for each individual node in our data center, so i wonder how could we connect the cluster together? Have any one tried this before, and if this is a good way to deploy cassandra? Thanks, Trung.
[no subject]
Hi there, We are working on an API service that receives arbitrary json data, these data can be nested json data or just normal json data. We started using Astyanax but we noticed we couldn't use CQL3 to target the arbitrary columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are to be ran against these arbitrary data stored in Cassandra. -- Ebot T.
Re:
Hi Elder. Welcome. We hope help you. On Tue, Apr 29, 2014 at 9:28 PM, Ebot Tabi ebot.t...@gmail.com wrote: Hi there, We are working on an API service that receives arbitrary json data, these data can be nested json data or just normal json data. We started using Astyanax but we noticed we couldn't use CQL3 to target the arbitrary columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are to be ran against these arbitrary data stored in Cassandra. -- Ebot T. -- Cheers!. Otávio Gonçalves de Santana blog: http://otaviosantana.blogspot.com.br/ twitter: http://twitter.com/otaviojava site: *http://about.me/otaviojava http://about.me/otaviojava* 55 (11) 98255-3513
Re:
I am hoping as well to get help on how to handle such scenario, the reason we choose Cassandra was its performance for heavy writes. On Wed, Apr 30, 2014 at 12:38 AM, Otávio Gonçalves de Santana otaviopolianasant...@gmail.com wrote: Hi Elder. Welcome. We hope help you. On Tue, Apr 29, 2014 at 9:28 PM, Ebot Tabi ebot.t...@gmail.com wrote: Hi there, We are working on an API service that receives arbitrary json data, these data can be nested json data or just normal json data. We started using Astyanax but we noticed we couldn't use CQL3 to target the arbitrary columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are to be ran against these arbitrary data stored in Cassandra. -- Ebot T. -- Cheers!. Otávio Gonçalves de Santana blog: http://otaviosantana.blogspot.com.br/ twitter: http://twitter.com/otaviojava site: *http://about.me/otaviojava http://about.me/otaviojava* 55 (11) 98255-3513 -- Ebot T.
Re: Connect Cassandra rings in datacenter and ec2
You will need to have the nodes running on AWS in a VPC. You can then configure a VPN to work with your VPC, see http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html. Also as you will have multiple VPN connections (from your private DC and the other AWS region) AWS CloudHub will be the way to go http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPN_CloudHub.html. Additionally to access your Cassandra instances from your other VPCs you can use VPC peering (within the same region). See http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 30 Apr 2014, at 11:38 am, Chris Lohfink clohf...@blackbirdit.com wrote: Cassandra will require a different address per node though or at least 1 unique internal for same DC and 1 unique external for other DCs. You could look into http://aws.amazon.com/vpc/ or some other vpn solution. --- Chris Lohfink On Apr 29, 2014, at 6:56 PM, Trung Tran tr...@brightcloud.com wrote: Hi, We're planning to deploy 3 cassandra rings, one in our datacenter (with more node/power) and two others in EC2. We don't have enough public IP to assign for each individual node in our data center, so i wonder how could we connect the cluster together? Have any one tried this before, and if this is a good way to deploy cassandra? Thanks, Trung.
Re: row caching for frequently updated column
thanks all for the pointers. let' me see if I can put the sequences of event together 1.2 people mis-understand/mis-use row cache, that cassandra cached the entire row of data even if you are only looking for small subset of the row data. e.g select single_column from a_wide_row_table will result in entire row cached even if you are only interested in one single column of a row. 2.0 and because of potential misuse of heap memory, Cassandra 2.0 remove heap cache, and only support off-heap cache, which has a side effect that write will invalidate the row cache(my original question) 2.1 the coming 2.1 Cassandra will offer true cache by query, so the cached data will be much more efficient even for wide rows(it cached what it needs). do I get it right? for the new 2.1 row caching, is it still true that a write or update to the row will still invalidate the cached row ? On Tue, Apr 29, 2014 at 3:00 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam y2k...@gmail.com wrote: Are these issues 'resolved' only in 2.0 or later release? What about 1.2 version? As I understand it : 1.2 version has the on-heap row cache and off-heap row cache. It does not have the new partition cache. 2.0 version has only the off-heap row cache. It does not have the on-heap row cache or the new partition cache. 2.1 version has the new partition cache. In summary, you probably don't want to use any of these half-baked, immature internal row/etc. caches unless you are very, very certain that you have an ideal case for them. =Rob
Cassandra Client authentication and system table replication question
Hi We have enabled cassandra client authentication and have set new user/pass per keyspace. As I understand user/pass is stored in the system table, do we need to change the replication factor of the system table so this data is replicated? The cluster is going to be multi-dc. Thanks Anand
Re: Cassandra Client authentication and system table replication question
Correction credentials are stored in the system_auth table, so it is ok/recommended to change the replication factor of that keyspace? On Tue, Apr 29, 2014 at 10:41 PM, Anand Somani meatfor...@gmail.com wrote: Hi We have enabled cassandra client authentication and have set new user/pass per keyspace. As I understand user/pass is stored in the system table, do we need to change the replication factor of the system table so this data is replicated? The cluster is going to be multi-dc. Thanks Anand