unsubscribe
unsubscribe -- http://about.me/BrianTarbox
unsubscribe
-- http://about.me/BrianTarbox
Re: write timeout
My group is seeing the same thing and also can not figure out why its happening. On Mon, Mar 23, 2015 at 8:36 AM, Anishek Agarwal anis...@gmail.com wrote: Forgot to mention I am using Cassandra 2.0.13 On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am using a single node server class machine with 16 CPUs with 32GB RAM with a single drive attached to it. my table structure is as below CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat float, lon float, a bigint, primary key (id, ts)); I am trying to insert 300 entries per partition key with 4000 partition keys using 25 threads. Configurations write_request_timeout_in_ms: 5000 concurrent_writes: 32 heap space : 8GB Client side timeout is 12 sec using datastax java driver. Consistency level: ONE With the above configuration i try to run it 10 times to eventually generate around 300 * 4000 * 10 = 1200 entries, When i run this after the first few runs i get a WriteTimeout exception at client with 1 replica were required but only 0 acknowledged the write message. There are no errors in server log. Why does this error come how do i know what is the limit I should limit concurrent writes to a single node to. Looking at iostat disk utilization seems to be at 1-3% when running this. Please let me know if anything else is required. Regards, Anishek -- http://about.me/BrianTarbox
Re: High read latency after data volume increased
C* seems to have more than its share of version x doesn't work, use version y type issues On Thu, Jan 8, 2015 at 2:23 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com wrote: We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 2.1.2 in particular is known to have significant issues. You'd be better off running 2.1.1 ... =Rob -- http://about.me/BrianTarbox
multiple threads updating result in TransportException
We're running into a problem where things are fine if our client runs single threaded but gets TransportException if we use multiple threads. The datastax driver gets an NIO checkBounds error. Here is a link to a stack overflow question we found that describes the problem we're seeing. This question was asked 7 months ago and got no answers. We're running C* 2.0.9 and see the problem on our single node test cluster. Here is the stack trace we see: at java.nio.Buffer.checkBounds(Buffer.java:559) ~[na:1.7.0_55] at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143) ~[na:1.7.0_55] at org.jboss.netty.buffer.HeapChannelBuffer.setBytes(HeapChannelBuffer.java:136) ~[netty-3.7.0.Final.jar:na] at org.jboss.netty.buffer.AbstractChannelBuffer.writeBytes(AbstractChannelBuffer.java:472) ~[netty-3.7.0.Final.jar:na] at com.datastax.driver.core.CBUtil.writeValue(CBUtil.java:272) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at com.datastax.driver.core.CBUtil.writeValueList(CBUtil.java:297) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at com.datastax.driver.core.Requests$QueryProtocolOptions.encode(Requests.java:223) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at com.datastax.driver.core.Requests$Execute$1.encode(Requests.java:122) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at com.datastax.driver.core.Requests$Execute$1.encode(Requests.java:119) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at com.datastax.driver.core.Message$ProtocolEncoder.encode(Message.java:184) ~[cassandra-driver-core-2.0.0-rc2.jar:na] at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:66) ~[netty-3.7.0.Final.jar:na] at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) ~[netty-3.7.0.Final.jar:na] at org.jboss.netty.channel.Channels.write(Channels.java:704) ~[netty-3.7.0.Final.jar:na] at org.jboss.netty.channel.Channels.write(Channels.java:671) ~[netty-3.7.0.Final.jar:na] at org.jboss.netty.channel.Ab -- http://about.me/BrianTarbox
read after write inconsistent even on a one node cluster
We're doing development on a single node cluster (and yes of course we're not really deploying that way), and we're getting inconsistent behavior on reads after writes. We write values to our keyspaces and then immediately read the values back (in our Cucumber tests). About 20% of the time we get the old value.if we wait 1 second and redo the query (within the same java method) we get the new value. This is all happening on a single node...how is this possible? We're using 2.0.9 and the java client. Though it shouldn't matter given a single node cluster I set the consistency level to ALL with no effect. I've read CASSANDRA-876 which seems spot-on but it was closed as won't-fix...and I don't see what the solution is. Thanks in advance for any help. Brian Tarbox -- http://about.me/BrianTarbox
Re: read after write inconsistent even on a one node cluster
Thanks. Right now its just for testing but in general we can't guard against multiple users ending up the one writes and then one reads. It would be one thing if the read just got old data but we're seeing it return wrong data...i.e. data that doesn't correspond to any particular version of the object. Brian On Thu, Nov 6, 2014 at 10:30 AM, Eric Stevens migh...@gmail.com wrote: If this is just for doing tests to make sure you get back the data you expect, I would recommend looking some sort of eventually construct in your testing. We use Specs2 as our testing framework, and our write-then-read tests look something like this: someDAO.write(someObject) eventually { someDAO.read(someObject.id) mustEqual someObject } This will retry the read repeatedly over a short duration. Just in case you are trying to do write-then-read outside of tests, you should be aware that it's a Bad Idea™, but your email reads like you already know that =) On Thu Nov 06 2014 at 7:16:25 AM Brian Tarbox briantar...@gmail.com wrote: We're doing development on a single node cluster (and yes of course we're not really deploying that way), and we're getting inconsistent behavior on reads after writes. We write values to our keyspaces and then immediately read the values back (in our Cucumber tests). About 20% of the time we get the old value.if we wait 1 second and redo the query (within the same java method) we get the new value. This is all happening on a single node...how is this possible? We're using 2.0.9 and the java client. Though it shouldn't matter given a single node cluster I set the consistency level to ALL with no effect. I've read CASSANDRA-876 which seems spot-on but it was closed as won't-fix...and I don't see what the solution is. Thanks in advance for any help. Brian Tarbox -- http://about.me/BrianTarbox -- http://about.me/BrianTarbox
Re: Options for expanding Cassandra cluster on AWS
The last guidance I heard from DataStax was to use m2.2xlarge's on AWS and put data on the ephemeral drivehave they changed this guidance? Brian On Tue, Aug 19, 2014 at 9:41 AM, Oleg Dulin oleg.du...@gmail.com wrote: Distinguished Colleagues: Our current Cassandra cluster on AWS looks like this: 3 nodes in N. Virginia, one per zone. RF=3 Each node is a c3.4xlarge with 2x160G SSDs in RAID-0 (~300 Gig SSD on each node). Works great, I find it the most optimal configuration for a Cassandra node. But the time is coming soon when I need to expand storage capacity. I have the following options in front of me: 1) Add 3 more c3.4xlarge nodes. This keeps the amount of data on each node reasonable, and all repairs and other tasks can complete in a reasonable amount of time. The downside is that c3.4xlarge are pricey. 2) Add provisioned EBS volumes. These days I can get SSD-backed EBS with up to 4000 IOPS provisioned. I can add those volumes to data_directories list in Yaml, and I expect Cassandra can deal with that JBOD-style The upside is that it is much cheaper than option #1 above; the downside is that it is a much slower configuration and repairs can take longer. I'd appreciate any input on this topic. Thanks in advance, Oleg -- http://about.me/BrianTarbox
do all nodes actually send the data to the coordinator when doing a read?
We're considering a C* setup with very large columns and I have a question about the details of read. I understand that a read request gets handled by the coordinator which sends read requests to quorum of the nodes holding replicas of the data, and once quorum nodes have replied with consistent data it is returned to the client. My understanding is that each of the nodes actually sends the full data being requested to the coordinator (which in the case of very large columns would involve lots of network traffic). Is that right? The alternative (which I don't think is the case but I've been asked to verify) is that the replicas first send meta-data to the coordinator which then asks one replica to send the actual data. Again, I don't think this is the case but was asked to confirm. Thanks. -- http://about.me/BrianTarbox
nodetool repair saying starting and then nothing, and nothing in any of the server logs either
I have a six node cluster in AWS (repl:3) and recently noticed that repair was hanging. I've run with the -pr switch. I see this output in the nodetool command line (and also in that node's system.log): Starting repair command #9, repairing 256 ranges for keyspace dev_a but then no other output. And I see nothing in any of the other node's log files. Right now the application using C* is turned off so there is zero activity. I've let it be in this state for up to 24 hours with nothing more logged. Any suggestions?
Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either
We're running 1.2.13. Any chance that doing a rolling-restart would help? Would running without the -pr improve the odds? Thanks. On Tue, Jul 1, 2014 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 1, 2014 at 9:24 AM, Brian Tarbox tar...@cabotresearch.com wrote: I have a six node cluster in AWS (repl:3) and recently noticed that repair was hanging. I've run with the -pr switch. It'll do that. What version of Cassandra? =Rob
Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either
Does this output from jstack indicate a problem? ReadRepairStage:12170 daemon prio=10 tid=0x7f9dcc018800 nid=0x7361 waiting on condition [0x7f9db540c000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000613e049d8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ReadRepairStage:12169 daemon prio=10 tid=0x7f9dd4009000 nid=0x7340 waiting on condition [0x7f9db53cb000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000613e049d8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ReadRepairStage:12168 daemon prio=10 tid=0x7f9dd001d000 nid=0x733f waiting on condition [0x7f9db51a6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000613e049d8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) On Tue, Jul 1, 2014 at 2:09 PM, Brian Tarbox tar...@cabotresearch.com wrote: We're running 1.2.13. Any chance that doing a rolling-restart would help? Would running without the -pr improve the odds? Thanks. On Tue, Jul 1, 2014 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 1, 2014 at 9:24 AM, Brian Tarbox tar...@cabotresearch.com wrote: I have a six node cluster in AWS (repl:3) and recently noticed that repair was hanging. I've run with the -pr switch. It'll do that. What version of Cassandra? =Rob
Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either
Given that an upgrade is (for various internal reasons) not an option at this point...is there anything I can do to get repair working again? I'll also mention that I see this behavior from all nodes. Thanks. On Tue, Jul 1, 2014 at 2:51 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 1, 2014 at 11:09 AM, Brian Tarbox tar...@cabotresearch.com wrote: We're running 1.2.13. 1.2.17 contains a few streaming fixes which might help. Any chance that doing a rolling-restart would help? Probably not. Would running without the -pr improve the odds? No, that'd make it less likely to succeed. =Rob
Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either
For what purpose are you running repair? Because I read that we should! :-) We do delete data from one column family quite regularly...from the other CFs occasionally. We almost never run with less than 100% of our nodes up. In this configuration do we *need* to run repair? Thanks, On Tue, Jul 1, 2014 at 2:57 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 1, 2014 at 11:54 AM, Brian Tarbox tar...@cabotresearch.com wrote: Given that an upgrade is (for various internal reasons) not an option at this point...is there anything I can do to get repair working again? I'll also mention that I see this behavior from all nodes. I think maybe increasing your phi tolerance for streaming timeouts might help. But basically, no. Repair has historically been quite broken in AWS. It was re-written in 2.0 along with the rest of streaming, and hopefully will soon stabilize and actually work. For what purpose are you running repair? =Rob
running out of diskspace during maintenance tasks
I'm running on AWS m2.2xlarge instances using the ~800 gig ephemeral/attached disk for my data directory. My data size per node is nearing 400 gig. Sometimes during maintenance operations (repairs mostly I think) I run out of disk space as my understanding is that some of these operations require double the space of one's data. Since I can't change the size of attached storage for my instance type my question is can I somehow get these maintenance operations to use other volumes? Failing that, what are my options? Thanks. Brian Tarbox
Re: running out of diskspace during maintenance tasks
We do a repair -pr on each node once a week on a rolling basis. Should we be running cleanup as well? My understanding that was only used after adding/removing nodes? We'd like to avoid adding nodes if possible (which might not be). Still curious if we can get C* to do the maintenance task on a separate volume. Thanks. On Wed, Jun 18, 2014 at 12:03 PM, Jeremy Jongsma jer...@barchart.com wrote: One option is to add new nodes, and do a node repair/cleanup on everything. That will at least reduce your per-node data size. On Wed, Jun 18, 2014 at 11:01 AM, Brian Tarbox tar...@cabotresearch.com wrote: I'm running on AWS m2.2xlarge instances using the ~800 gig ephemeral/attached disk for my data directory. My data size per node is nearing 400 gig. Sometimes during maintenance operations (repairs mostly I think) I run out of disk space as my understanding is that some of these operations require double the space of one's data. Since I can't change the size of attached storage for my instance type my question is can I somehow get these maintenance operations to use other volumes? Failing that, what are my options? Thanks. Brian Tarbox
can I kill very old data files in my data folder (I know that sounds crazy but....)
I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. They take the form: keyspace-CFName-ic--Filter.db keyspace-CFName-ic--Index.db keyspace-CFName-ic--Data.db keyspace-CFName-ic--Statistics.db keyspace-CFName-ic--TOC.txt keyspace-CFName-ic--Summary.db I have six bunches of these file groups, each with a different value...and with timestamps of each of the last five days...plus one group from 3 weeks ago...which makes me wonder if that group somehow should have been deleted but were not. The files are tens or hundreds of gigs so deleting would be good, unless its really bad! Thanks, Brian Tarbox
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
Rob, Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Thanks, Brian On Wed, Jun 18, 2014 at 2:37 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox tar...@cabotresearch.com wrote: I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. Are you using TTL? If so : https://issues.apache.org/jira/browse/CASSANDRA-6654 Are you using size tiered or level compaction? I have six bunches of these file groups, each with a different value...and with timestamps of each of the last five days...plus one group from 3 weeks ago...which makes me wonder if that group somehow should have been deleted but were not. The files are tens or hundreds of gigs so deleting would be good, unless its really bad! Data files can't be deleted from the data dir with Cassandra running, but it should be fine (if probably technically unsupported) to delete them with Cassandra stopped. In most cases you don't want to do so, because you might un-mask deleted rows or cause unexpected consistency characteristics. In your case, you know that no data in files created 3 weeks old can possibly have any value, so it is safe to delete them. =Rob
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
I don't think I have the space to run a major compaction right now (I'm above 50% disk space used already) and compaction can take extra space I think? On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com wrote: Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Sure, though in your case (because you're using STS and can) I'd probably just run a major compaction. =Rob
Specifying startBefore with iterators with compositeKeys
I have a composite key consisting of: (integer, bytes) and I have rows like: (1,abc), (1,def), (2,abc), (2,def) and I want to find all rows with the integer part = 2. I need to create a startBeyondName using CompositeType.Builder class and am wondering if specifying (2, Bytes.Empty) will sort correctly? I think another way of saying this is: does HeapByteBuffer with pos=,lim=0,cap=0 sort prior to any other possible HeapByteBuffer? Thanks.
getting dropped messages in log even with no one running
I'm getting messages dropped messages in my cluster even when (like right now) there are no clients running against the cluster. 1) who could be generating the traffic if there are no clients? 2) is there a way to list active clients...on the off chance that there is a client I don't know about? 3) why is messages dropped an INFO rather than a WARNING? I'm running 1.2.13 on a six node AWS cluster on m2-2xlarge servers. Any help is appreciated. Brian
Re: getting dropped messages in log even with no one running
The problem was one of my nodes was in some kind of bad Hinted-Handoff loop. I looked at CASSANDRA-4740 which discusses this but to no solution that I could see. When I killed the server trying to do the hinted-handoffs the other nodes stopped complaining...as soon as I restarted the node all the other nodes went right back into the dropping messages state. Help please. Brian On Mon, Mar 24, 2014 at 10:01 AM, Brian Tarbox tar...@cabotresearch.comwrote: I'm getting messages dropped messages in my cluster even when (like right now) there are no clients running against the cluster. 1) who could be generating the traffic if there are no clients? 2) is there a way to list active clients...on the off chance that there is a client I don't know about? 3) why is messages dropped an INFO rather than a WARNING? I'm running 1.2.13 on a six node AWS cluster on m2-2xlarge servers. Any help is appreciated. Brian
getting lots of dropped messages/requests/mutations but only on 2 of 6 servers
I have a six node cluster (running m2-2xlarge instances in AWS) with RF=3 and I'm seeing two of the six nodes reporting lots of dropped messages. The six machines are identical (created from same AWS AMI) so this local behavior has me puzzled. BTW this is mostly happening when I'm reading via secondary indexes. I only have 40 gig of data or so on each machine. Any help appreciated.
Re: this seems like a flaw in Node Selection
Does this still apply since we're using 1.2.13? (should have said that in the original message) Thank you. On Thu, Mar 20, 2014 at 3:57 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Mar 20, 2014 at 12:31 PM, Brian Tarbox tar...@cabotresearch.comwrote: I've seen this problem with other companies and products: leastloaded as a means of picking servers is almost always liable to death spirals when a server can have a failure. Is there any way to configure away from this in C*? Disable the dynamic snitch... via the hidden but still valid [1] configuration directive dynamic_snitch in cassandra.yaml. dynamic_snitch: false https://issues.apache.org/jira/browse/CASSANDRA-3229 Given the various other bad edge cases with the Dynamic Snitch (sending requests to the wrong DC on reset, etc.) this might be worth considering in general... especially with the speculative execution stuff in 2.0... I do note that https://issues.apache.org/jira/browse/CASSANDRA-6465 has a patch in 2.0.5, yay... =Rob
Re: this seems like a flaw in Node Selection
Yes, I was going to say (sorry for the brain-freeze) that this is behavior in Pelops not in C* itself. On Thu, Mar 20, 2014 at 4:15 PM, Tyler Hobbs ty...@datastax.com wrote: Brian, Are you referring to Pelops? The code you mentioned doesn't exist in Cassandra. On Thu, Mar 20, 2014 at 3:07 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Mar 20, 2014 at 1:03 PM, Brian Tarbox tar...@cabotresearch.comwrote: Does this still apply since we're using 1.2.13? (should have said that in the original message) I checked the cassandra-1.2 branch to verify that the dynamic_snitch config file option is still supported there; it is. =Rob -- Tyler Hobbs DataStax http://datastax.com/
in AWS is it worth trying to talk to a server in the same zone as your client?
We're running a C* cluster with 6 servers spread across the four us-east1 zones. We also spread our clients (hundreds of them) across the four zones. Currently we give our clients a connection string listing all six servers and let C* do its thing. This is all working just fine...and we're paying a fair bit in AWS transfer costs. There is a suspicion that this transfer cost is driven by us passing data around between our C* servers and clients. Would there be any value to trying to get a client to talk to one of the C* servers in its own zone? I understand (at least partially!) about coordinator nodes and replication and know that no matter which server is the coordinator for an operation replication may cause bits to get transferred to/from servers in other zones. Having said that...is there a chance that trying to encourage a client to initially contact a server in its own zone would help? Thank you, Brian Tarbox
Re: in AWS is it worth trying to talk to a server in the same zone as your client?
We're definitely using all private IPs. I guess my question really is: with repl=3 and quorum operations I know we're going to push/pull bits across the various AZs within us-east-1. So, does having the client start the conversation with a server in the same AZ save us anything? On Wed, Feb 12, 2014 at 4:14 PM, Ben Bromhead b...@instaclustr.com wrote: 0.01/G between zones irrespective of IP is correct. As for your original question, depending on the driver you are using you could write a custom co-ordinator node selection policy. For example if you are using the Datastax driver you would extend http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html ... and set the distance based on which zone the node is in. An alternate method would be to define the zones as data centres and then you could leverage existing DC aware policies (We've never tried this though). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr | +61 415 936 359 On 13/02/2014, at 8:00 AM, Andrey Ilinykh ailin...@gmail.com wrote: I think you are mistaken. It is true for the same zone. between zones 0.01/G On Wed, Feb 12, 2014 at 12:17 PM, Russell Bradberry rbradbe...@gmail.comwrote: Not when using private IP addresses. That pricing *ONLY *applies if you are using the public interface or EIP/ENI. If you use the private IP addresses there is no cost associated. On February 12, 2014 at 3:13:58 PM, William Oberman ( ober...@civicscience.com //ober...@civicscience.com) wrote: Same region, cross zone transfer is $0.01 / GB (see http://aws.amazon.com/ec2/pricing/, Data Transfer section). On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry rbradbe...@gmail.comwrote: Cross zone data transfer does not cost any extra money. LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical datacenter. Ensure your clients are connecting to either the local IP or the AWS hostname that is a CNAME to the local ip from within AWS. If you connect to the public IP you will get charged for outbound data transfer. On February 12, 2014 at 2:58:07 PM, Yogi Nerella (ynerella...@gmail.com//ynerella...@gmail.com) wrote: Also, may be you need to check the read consistency to local_quorum, otherwise the servers still try to read the data from all other data centers. I can understand the latency, but I cant understand how it would save money? The amount of data transferred from the AWS server to the client should be same no matter where the client is connected? On Wed, Feb 12, 2014 at 10:33 AM, Andrey Ilinykh ailin...@gmail.comwrote: yes, sure. Taking data from the same zone will reduce latency and save you some money. On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox tar...@cabotresearch.com wrote: We're running a C* cluster with 6 servers spread across the four us-east1 zones. We also spread our clients (hundreds of them) across the four zones. Currently we give our clients a connection string listing all six servers and let C* do its thing. This is all working just fine...and we're paying a fair bit in AWS transfer costs. There is a suspicion that this transfer cost is driven by us passing data around between our C* servers and clients. Would there be any value to trying to get a client to talk to one of the C* servers in its own zone? I understand (at least partially!) about coordinator nodes and replication and know that no matter which server is the coordinator for an operation replication may cause bits to get transferred to/from servers in other zones. Having said that...is there a chance that trying to encourage a client to initially contact a server in its own zone would help? Thank you, Brian Tarbox
Re: Opscenter tabs
A vaguely related question...my OpsCenter now has two separate tabs for the same cluster...one tab shows all six nodes and has their agents...the other tab has the same six nodes but no agents. I see no way to get rid of the spurious tab. On Thu, Jan 23, 2014 at 12:47 PM, Ken Hancock ken.hanc...@schange.comwrote: Multiple DCs are still a single cluster in OpsCenter. If you go to Physical View, you should see one column for each data center. Also, the Community edition of OpsCenter, last I saw, only supported a single cluster. On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry daniel.cu...@arrayent.comwrote: I am unable to find any references on if the tabs to monitor multiple DC can be configure to read the DC location. I do not want to change the cluster name itself. Right now I see three tabs all with the same names cluster_name: test. Like to keep the current cluster name test, but change the opscenter tabs to DC1, DC2, and DC3. Is this documented somewhere? -- Daniel Curry Sr Linux Systems Administrator Arrayent, Inc. 2317 Broadway Street, Suite 20 Redwood City, CA 94063 dan...@arrayent.com -- *Ken Hancock *| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn] http://www.linkedin.com/in/kenhancock [image: SeaChange International] http://www.schange.com/This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.
Re: bad interaction between CompositeTypes and Secondary index
The table was created this way, we also avoid altering exiting tables. On Tue, Jan 21, 2014 at 4:19 PM, Jacob Rhoden jacob.rho...@me.com wrote: Was the original table created, or created then altered? It makes a difference as I have seen this type of thing occur on tables I first created then updated. Not sure if that issue was fixed in 2.0.4, I'm avoiding altering tables completely for now. __ Sent from iPhone On 22 Jan 2014, at 7:50 am, Brian Tarbox tar...@cabotresearch.com wrote: We're trying to use CompositeTypes and Secondary indexes and are getting an assertion failure in ExtendedFilter.java line 258 (running C* 2.0.3) when we call getIndexedColumns. The assertion is for not finding any columns. The strange bit is that if we re-create the column family in question and do *not *set ComparatorType then things work fine. This seems odd since as I understand it the ComparatorType is for controlling the ordering of columns within a row and the Secondary Index is to find a subset of rows that contain a particular column valuein other words they seem like they shouldn't have an interaction. Its also puzzling to us that ExtendedFilter asserts in this case...if it find no columns I would have expected an empty return but not a failure (that our client code saw as a Timeout exception). Any clues would be appreciated. Thanks, Brian Tarbox
changing several things (almost) at once; is this the right order to make the changes?
We're making several changes and I'd to confirm that our order of making them is reasonable. Right now we have 4 node system at replicationFactor=2 running 1.1.6. We've moving to a 6 node system at rf=3 running 1.2.12 (I guess). We think the order should be: 1) change to rf=3 and run repair on all nodes while still at 1.1.6 2) upgrade to 1.1.10 (latest on that branch?) 3) upgrade to 1.2.12 (latest on that branch?) 4) run the convert-to-v_Node command 5) add two more servers Is that reasonable? Thanks. We run in ec2 and I'm planning on testing it all on a new set of servers just in case but figured I'd ask the experts first in case I'm doing something foolish. Thanks, Brian Tarbox
looking for advice before upgrading from 1.1.6 to 2.0.1
We're currently running our pre-production system on a 4 node EC2 cluster with C* 1.1.6. We have the luxury of a fresh install..rebuilding all our data so we can skip upgrades and just install a clean system. We obviously won't to do this very often so we'd like to do it right...take advantage of new features like vnodes (so we can scale more easily...we've very likely going to 6 nodes soon but not yet) and such. We haven't made many changes to the yaml file...any advice on: - configuration settings? - any client side java level changes we have to make? - other features we should really consider? would be most appreciated. Thank you, Brian Tarbox
Re: Cassandra JVM heap sizes on EC2
The advice I heard at the New York C* conference...which we follow is to use the m2.2xlarge and give it about 8 GB. The m2.4xlarge seems overkill (or at least over price). Brian On Fri, Aug 23, 2013 at 6:12 PM, David Laube d...@stormpath.com wrote: Hi All, We are evaluating our JVM heap size configuration on Cassandra 1.2.8 and would like to get some feedback from the community as to what the proper JVM heap size should be for cassandra nodes deployed on to Amazon EC2. We are running m2.4xlarge EC2 instances (64GB RAM, 8 core, 2 x 840GB disks) --so we will have plenty of RAM. I've already consulted the docs at http://www.datastax.com/documentation/cassandra/1.2/mobile/cassandra/operations/ops_tune_jvm_c.html but would love to hear what is working or not working for you in the wild. Since Datastax cautions against using more than 8GB, I'm wondering if it is even advantageous to use even slightly more. Thanks, -David Laube
Re: Which of these VPS configurations would perform better for Cassandra ?
We run a cluster in EC2 and it's working very well for us. The standard seems to be M2.2XLarge instances with data living on the ephemeral drives (which means its local and fast) and backups either to EBS, S3 or just relying on cluster size and replication (we avoid that last idea). Brian On Sun, Aug 4, 2013 at 9:02 PM, Ben Bromhead b...@instaclustr.com wrote: If you want to get a rough idea of how things will perform, fire up YCSB ( https://github.com/brianfrankcooper/YCSB/wiki) and run the tests that closest match how you think your workload will be (run the test clients from a couple of beefy AWS spot-instances for less than a dollar). As you are a new startup without any existing load/traffic patterns, benchmarking will be your best bet. As a have a look at running Cassandra with SmartOS on Joyent. When you run SmartOS on Joyent virtualisation is done using solaris zones, an OS based virtualisation, which is at least a quadrillion times better than KVM, xen etc. Ok maybe not that much… but it is pretty cool and has the following benefits: - No hardware emulation. - Shared kernel with the host (you don't have to waste precious memory running a guest os). - ZFS :) Have a read of http://wiki.smartos.org/display/DOC/SmartOS+Virtualization for more info. There are some downsides as well: The version of Cassandra that comes with the SmartOS package management system is old and busted, so you will want to build from source. You will want to be technically confident in running on something a little outside the norm (SmartOS is based on Solaris). Just make sure you test and benchmark all your options, a few days of testing now will save you weeks of pain. Good luck! Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr On 05/08/2013, at 12:34 AM, David Schairer dschai...@humbaba.net wrote: Of course -- my point is simply that if you're looking for speed, SSD+KVM, especially in a shared tenant situation, is unlikely to perform the way you want to. If you're building a pure proof of concept that never stresses the system, it doesn't matter, but if you plan an MVP with any sort of scale, you'll want a plan to be on something more robust. I'll also say that it's really important (imho) to be doing even your dev in a config where you have consistency conditions like eventual production -- so make sure you're writing to both nodes and can have cases where eventual consistency delays kick in, or it'll come back to bite you later -- I've seen this force people to redesign their whole data model when they don't plan for it initially. As I said, I haven't tested DO. I've tested very similar configurations at other providers and they were all terrible under load -- and certainly took away most of the benefits of SSD once you stressed writes a bit. XEN+SSD, on modern kernels, should work better, but I didn't test it (linode doesn't offer this, though, and they've had lots of other challenges of late). --DRS On Aug 3, 2013, at 11:40 PM, Ertio Lew ertio...@gmail.com wrote: @David: Like all other start-ups, we too cannot start with all dedicated servers for Cassandra. So right now we have no better choice except for using a VPS :), but we can definitely choose one from amongst a suitable set of VPS configurations. As of now since we are starting out, could we initiate our cluster with 2 nodes(RF=2), (KVM, 2GB ram, 2 cores, 30GB SDD) . Right now we wont we having a very heavy load on Cassandra until a next few months till we grow our user base. So, this choice is mainly based on the pricing vs configuration as well as digital ocean's good reputation in the community. On Sun, Aug 4, 2013 at 12:53 AM, David Schairer dschai...@humbaba.net wrote: I've run several lab configurations on linodes; I wouldn't run cassandra on any shared virtual platform for large-scale production, just because your IO performance is going to be really hard to predict. Lots of people do, though -- depends on your cassandra loads and how consistent you need to have performance be, as well as how much of your working set will fit into memory. Remember that linode significantly oversells their CPU as well. The release version of KVM, at least as of a few months ago, still doesn't support TRIM on SSD; that, plus the fact that you don't know how others will use SSDs or if their file systems will keep the SSDs healthy, means that SSD performance on KVM is going to be highly unpredictable. I have not tested digitalocean, but I did test several other KVM+SSD shared-tenant hosting providers aggressively for cassandra a couple months ago; they all failed badly. Your mileage will vary considerably based on what you need out of cassandra, what your data patterns look like, and how you configure your system. That said, I would use xen before KVM for high-performance IO. I have not run Cassandra in any volume on
Re: too many open files
Odd that this discussion happens now as I'm also getting this error. I get a burst of error messages and then the system continues...with no apparent ill effect. I can't tell what the system was doing at the timehere is the stack. BTW Opscenter says I only have 4 or 5 SSTables in each of my 6 CFs. ERROR [ReadStage:62384] 2013-07-14 18:04:26,062 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:62384,5,main] java.io.IOError: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files) at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:69) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:898) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:63) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:61) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:124) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:67) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:64) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:41) at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:63) ... 16 more On Mon, Jul 15, 2013 at 7:23 AM, Michał Michalski mich...@opera.com wrote: It doesn't tell you anything if file ends it with ic-###, except pointing out the SSTable version it uses (ic in this case). Files related to secondary index contain something like this in the filename: KS-CF.IDX-NAME, while in regular CFs do not contain any dots except the one just before file extension. M. W dniu 15.07.2013 09:38, Paul Ingalls pisze: Also, looking through the log, it appears a lot of the files end with ic- which I assume is associated with a secondary index I have on the table. Are secondary indexes really expensive from a file descriptor standpoint? That particular table uses the default compaction scheme... On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote: I have one table that is using leveled. It was set to 10MB, I will try changing it to 256MB. Is there a good way to merge the existing sstables? On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote: I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Alternate major compaction
Perhaps I should already know this but why is running a major compaction considered so bad? We're running 1.1.6. Thanks. On Thu, Jul 11, 2013 at 7:51 AM, Takenori Sato ts...@cloudian.com wrote: Hi, I think it is a common headache for users running a large Cassandra cluster in production. Running a major compaction is not the only cause, but more. For example, I see two typical scenario. 1. backup use case 2. active wide row In the case of 1, say, one data is removed a year later. This means, tombstone on the row is 1 year away from the original row. To remove an expired row entirely, a compaction set has to include all the rows. So, when do the original, 1 year old row, and the tombstoned row are included in a compaction set? It is likely to take one year. In the case of 2, such an active wide row exists in most of sstable files. And it typically contains many expired columns. But none of them wouldn't be removed entirely because a compaction set practically do not include all the row fragments. Btw, there is a very convenient MBean API is available. It is CompactionManager's forceUserDefinedCompaction. You can invoke a minor compaction on a file set you define. So the question is how to find an optimal set of sstable files. Then, I wrote a tool to check garbage, and print outs some useful information to find such an optimal set. Here's a simple log output. # /opt/cassandra/bin/checksstablegarbage -e /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 300(1373504071)] === ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, REMAINNING_SSTABLE_FILES === hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db --- TOTAL, 40, 40 === REMAINNING_SSTABLE_FILES means any other sstable files that contain the respective row. So, the following is an optimal set. # /opt/cassandra/bin/checksstablegarbage -e /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 300(1373504131)] === ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, REMAINNING_SSTABLE_FILES === hello5/100.txt.1373502926003, 223, 0, YES, YES --- TOTAL, 223, 0 === This tool relies on SSTableReader and an aggregation iterator as Cassandra does in compaction. I was considering to share this with the community. So let me know if anyone is interested. Ah, note that it is based on 1.0.7. So I will need to check and update for newer versions. Thanks, Takenori On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez tomas.nu...@groupalia.comwrote: Hi About a year ago, we did a major compaction in our cassandra cluster (a n00b mistake, I know), and since then we've had huge sstables that never get compacted, and we were condemned to repeat the major compaction process every once in a while (we are using SizeTieredCompaction strategy, and we've not avaluated yet LeveledCompaction, because it has its downsides, and we've had no time to test all of them in our environment). I was trying to find a way to solve this situation (that is, do something like a major compaction that writes small sstables, not huge as major compaction does), and I couldn't find it in the documentation. I tried cleanup and scrub/upgradesstables, but they don't do that (as documentation states). Then I tried deleting all data in a node and then bootstrapping it (or nodetool rebuild-ing it), hoping that this way the sstables would get cleaned from deleted records and updates. But the deleted node just copied the sstables from another node as they were, cleaning nothing. So I tried a new approach: I switched the sstable compaction strategy (SizeTiered to Leveled), forcing the sstables to be rewritten from scratch, and then switching it back (Leveled to SizeTiered). It took a while (but so do the major compaction process) and it worked, I have smaller sstables, and I've regained a lot of disk space. I'm happy with the results, but it doesn't seem a orthodox way of cleaning the sstables. What do you think, is it something wrong or crazy? Is there a different way to achieve the same thing? Let's put an example: Suppose you have a write-only
Re: Unreachable Nodes
Have to disagree with the does no harm comment just a tiny bit. I had a similar situation recently and coincidentally needed to do a CF truncate. The system rejected the request saying that not all nodes were up. Nodetool ring said everyone was up but nodetool gossipinfo said there were vestiges of dead nodes still hanging around. I ended up restarting the entire cluster which cleared the issue. Brian On Wed, May 22, 2013 at 6:46 AM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: Hello, Thanks for your fast response. That makes sense. I'll just keep an eye on it then. Many thanks, Vasilis On Wed, May 22, 2013 at 10:54 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi. I think that the unsafeAssassinateEndpoint was the good solution here. I was going to lead you to this solution after reading the first part of your message. Does anyone know why the dead nodes still appear when we run nodetool gossipinfo but they don't when we run describe cluster from the CLI? That's a good thing. Gossiper just keep this information for a while (7 or 10 days by default off the top off my head), but this doesn't harm your cluster in any ways, but having UNREACHABLE nodes could have been annoying. By the way gossipinfo shows you those nodes as STATUS:LEFT which is good. I am quite sure that this status changed when you used the jmx unsafeAssassinateEndpoint. do a full cluster restart (I presume that means a rolling restart - not shut-down the entire cluster right???). A full restart = entire cluster down = down time. It is precisely *not* a rolling restart. To conclude I would say that your cluster seems healthy now (from what I can see), you have no more ghost nodes and nothing to do. Just wait a week or so and look for gossipinfo again. 2013/5/22 Vasileios Vlachos vasileiosvlac...@gmail.com Hello All, A while ago we had 3 cassandra nodes on Amazon. At some point we decided to buy some servers and deploy cassandra there. The problem is that since then we have a list of dead IPs listed as UNREACHABLE nodes when we run describe cluster on cassandra-cli. I have seen other posts which describe similar issues, and the bottom line is it's harmless but if you want to get rid of it do a full cluster restart (I presume that means a rolling restart - not shut-down the entire cluster right???). Anyway... We also came across another solution: Install libmx4j-java, uncomment the respective line on /etc/default/cassandra, restart the node, go to http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper;, type in the dead IP/IPs next to the unsafeAssassinateEndpoint and invoke it. So we did that on one of the nodes for the list of dead IPs. After running describe cluster on the CLI on every node, we noticed that there were no UNREACHABLE nodes and everything looked OK. However, when we run nodetool gossipinfo we get the following output: /10.1.32.97 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.76851457173E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,56713727820156410577229101238628035243 /10.128.16.111 REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486 STATUS:LEFT,42537039300520238181471502256297362072,1369471488145 /10.128.16.110 REMOVAL_COORDINATOR:REMOVER,1 STATUS:LEFT,42537092606577173116506557155915918934,1369471275829 /10.1.32.100 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.75649392881E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,85070591730234615865843651857942052863 /10.1.32.101 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.71158702006E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,141784319550391026443072753096570088105 /10.1.32.98 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.73163150773E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,113427455640312821154458202477256070486 /10.128.16.112 REMOVAL_COORDINATOR:REMOVER,1 STATUS:LEFT,42537092606577173116506557155915918934,1369471567719 /10.1.32.99 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.72271268395E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,28356863910078205288614550619314017621 /10.1.32.96 RELEASE_VERSION:1.0.11 SCHEMA:b1116df0-b3dd-11e2--16fe4da5dbff LOAD:2.71494331357E11 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,0 Does anyone know why the dead nodes still appear when we run nodetool gossipinfo but they don't when we run describe cluster from the CLI? Thank you in advance for your help, Vasilis
best practices on EC2 question
From this list and the NYC* conference it seems that the consensus configuration of C* on EC2 is to put the data on an ephemeral drive and then periodically back it the drive to S3...relying on C*'s inherent fault tolerance to deal with any data loss. Fine, and we're doing this...but we find that transfer rates from S3 back to a rebooted server instance are *very *slow...like 15 MB/second or roughly a minute per gigabyte. Calling EC2 support resulting in them saying sorry, that's how it is. I'm wondering if anyone a) has found a faster way to transfer to S3, or b) do people skip backups altogether except for huge outages and just let rebooted server instances come up empty to repopulate via C*? An alternative that we had explored for a while was to do a two stage backup: 1) copy a C* snapshot from the ephemeral drive to an EBS drive 2) do an EBS snapshot to S3. The idea being that EBS is quite reliable, S3 is still the emergency backup and copying back from EBS to ephemeral is likely much faster than the 15 MB/sec we get from S3. Thoughts? Brian
how to monitor nodetool cleanup?
I'm recovering from a significant failure and so am doing lots of nodetool move, removetoken, repair and cleanup. For most of these I can do nodetool netstats to monitor progress but it doesn't show anything for cleanup...how can I monitor the progress of cleanup? On a related note: I'm able to stop all client access to the cluster until things are happy again...is there anything I can do to make move/repair/cleanup go faster? FWIW my problems came from trying to move nodes between EC2 availability zones...which led to 1) killing a node and recreating it in another availability zone 2) new node had different local ip address so cluster thought old node was just down and we had a new node... I did the removetoken on the dead node and gave the new node oldToken-1...but things still got weird and I ended up spending a couple of days cleaning up (which seems odd for only about 300 gig total data). Anyway, any suggestions for monitoring / speeding up cleanup would be appreciated. Brian Tarbox
Re: Cassandra Summit 2013
Jonathan, I'm a bit puzzled. I had planned to attend Cassandra's major conference in the summer but then the NYC* conference was announced. I spoke with DataStax and was told that there was no summer conference this year and that NYC* was all there was. So, I spent my conference time/budget on it and so now can't attend SFSummit25. NYC* was great but I feel that I got misled...or did I misunderstand somehow??? Brian Tarbox On Fri, Apr 12, 2013 at 11:50 AM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Last year's Summit saw fantastic talks [1] and over 800 attendees. The feedback was enthusiastic; the most commonly requested improvement was to extend it to two days. We're pleased to deliver just that for 2013! This year's Cassandra Summit will be at Fort Mason in San Francisco, California from June 11th - 12th, with 45+ sessions covering Cassandra use cases, development tips and tricks, war stories, how-tos, and more. The popular meet the experts room will also return. Engineers and committers from companies such as Spotify, eBay, Netflix, Comcast, BlueMountain Capital, and DataStax will be there excited to share their Cassandra experiences. The schedule of talks is about 90% final. To view it and register, visit http://www.datastax.com/company/news-and-events/events/cassandrasummit2013 and use the code SFSummit25 for 25% off. See you there! [1] http://www.datastax.com/company/news-and-events/events/cassandrasummit2012/presentations -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
any other NYC* attendees find your usb stick of the proceedings empty?
Last week I attended DataStax's NYC* conference and one of the give-aways was a wooden USB stick. Finally getting around to loading it I find it empty. Anyone else have this problem? Are the conference presentations available somewhere else? Brian Tarbox
Re: cfhistograms
I think we all go through this learning curve. Here is the answer I gave last time this question was asked: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0 On Mon, Mar 25, 2013 at 11:52 AM, Kanwar Sangha kan...@mavenir.com wrote: Can someone explain how to read the cfhistograms o/p ? ** ** [root@db4 ~]# nodetool cfhistograms usertable data usertable/data histograms Offset SSTables Write Latency Read Latency Row Size Column Count 12857444 4051 0 0 342711 26355104 27021 0 0 201313 32579941 61600 0 0 130489 4 374067119286 0 0 91378 5 9175210934 0 0 68548 6 0321098 0 0 54479 7 0476677 0 0 45427 8 0734846 0 0 38814 10 0 2867967 4 0 65512 12 0 536684422 0 59967 14 0 691143136 0 63980 17 0 10155740 127 0115714 20 0 7432318 302 0138759 24 0 5231047 969 0193477 29 0 2368553 2790 0209998 35 0859591 4385 0204751 42 0456978 3790 0214658 50 0306084 2465 0151838 60 0223202 2158 0 40277 72 0122906 2896 0 1735 ** ** ** ** Thanks Kanwar ** **
Re: what addresses to use in EC2 cluster (whenever an instance restarts it gets a new private ip)?
When my EC2 instance failed I restarted it, and added the new private IP address to the list of seed nodes (was this my error?). Nodetool then showed 4 live nodes and one dead one (corresponding to the old private IP address). I'm guessing that what I should have done on the restarted node is start it with -Dreplace_token? In such cases what should I do with the list of seed nodes? I think this is a great opportunity for a technical paper or something on how to setup Cassandra on EC2. :-) BTW: I'm running with encrypted disksrunning live on ephemeral drives that get periodically copied back to EBS stores so I don't lose anything. Brian On Tue, Feb 12, 2013 at 12:20 PM, aaron morton aa...@thelastpickle.comwrote: Cassandra handles nodes changing IP. The import thing to Cassandra is the token, not the IP. In your case did the replacement node have the same token as the failed one? You can normally work around these issues using commands like nodetool removetoken. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 10:04 AM, Andrey Ilinykh ailin...@gmail.com wrote: You have to use private IPs, but if an instance dies you have to bootstrap it with replace token flag. If you use EC2 I'd recommend Netflix's Priam tool. It manages all that stuff, plus you have S3 backup. Andrey On Mon, Feb 11, 2013 at 11:35 AM, Brian Tarbox tar...@cabotresearch.comwrote: How do I configure my cluster to run in EC2? In my cassandra.yaml I have IP addresses under seed_provider, listen_address and rpc_address. I tried setting up my cluster using just the EC2 private addresses but when one of my instances failed and I restarted it there was a new private address. Suddenly my cluster thought it have five nodes rather than four. Then I tried using Elastic IP addresses (permanent addresses) but it turns out you get charged for network traffic between elastic addresses even if they are within the cluster. So...how do you configure the cluster when the IP addresses can change out from under you? Thanks. Brian Tarbox
what addresses to use in EC2 cluster (whenever an instance restarts it gets a new private ip)?
How do I configure my cluster to run in EC2? In my cassandra.yaml I have IP addresses under seed_provider, listen_address and rpc_address. I tried setting up my cluster using just the EC2 private addresses but when one of my instances failed and I restarted it there was a new private address. Suddenly my cluster thought it have five nodes rather than four. Then I tried using Elastic IP addresses (permanent addresses) but it turns out you get charged for network traffic between elastic addresses even if they are within the cluster. So...how do you configure the cluster when the IP addresses can change out from under you? Thanks. Brian Tarbox
Re: Upcoming conferences
At what level will the NY talks be? I had been planning on attending Datastax's big summer conference and I might not be able to get approval for bothso I'd like to hear more about this one. On Wed, Jan 30, 2013 at 12:40 PM, Jonathan Ellis jbel...@gmail.com wrote: ApacheCon North America (Portland, Feb 26-28) has a Cassandra track on the 28th: http://na.apachecon.com/schedule/ NY C* Tech Day (NY, March 20) is a 2-track, one-day conference devoted to Cassandra: http://datastax.com/nycassandra2013/ See you there! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
trying to create encrypted ephemeral drive on Amazon
I've heard that on Amazon EC2 I should be using ephemeral drives...but I want/need to be using encrypted volumes. On my local machine I use cryptsetup to encrypt a device and then mount it and so on...but on Amazon I get the error: Cannot open device /dev/xvdb for read-only access. Reading further I wonder if this is even possible based on this statement in the Amazon doc set *An instance store is dedicated to a particular instance; however, the disk subsystem is shared among instances on a host computer* How are other folks achieving performance and encryption on EC2? Thanks.
Re: Is this how to read the output of nodetool cfhistograms?
Wei, Thank you for the explanation (Offset is always the x-axis, the other columns represent the y-axis (taken 5 independent times)). Part of this still doesn't make sense. If I look at just read latencies for example...am I to believe that 1916 times I had a latency of exactly 3229500 usecs? Is this just some weird 5-independent variable mushed together data bucketing??? OffsetSSTables Write Lat Read Lat 1109 0 349 642406 1331 0 147 1335840 1597 0 121 640374 *1916* 0 117 * 3229500* 2299 0 91 683749 2759 0 77 202722 On Tue, Jan 22, 2013 at 12:11 PM, Wei Zhu wz1...@yahoo.com wrote: I agree that Cassandra cfhistograms is probably the most bizarre metrics I have ever come across although it's extremely useful. I believe the offset is actually the metrics it has tracked (x-axis on the traditional histogram) and the number under each column is how many times that value has been recorded (y-axis on the traditional histogram). Your write latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7 writes took 20 and 19 writes took 24 Correct me if I am wrong. Thanks. -Wei -- *From:* Brian Tarbox tar...@cabotresearch.com *To:* user@cassandra.apache.org *Sent:* Tuesday, January 22, 2013 7:27 AM *Subject:* Re: Is this how to read the output of nodetool cfhistograms? Indeed, but how many Cassandra users have the good fortune to stumble across that page? Just saying that the explanation of the very powerful nodetool commands should be more front and center. Brian On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.comwrote: This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Is this how to read the output of nodetool cfhistograms?
The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
Indeed, but how many Cassandra users have the good fortune to stumble across that page? Just saying that the explanation of the very powerful nodetool commands should be more front and center. Brian On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.comwrote: This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: How can OpsCenter show me Read Request Latency where there are no read requests??
Hmm, that's sense but then why is the latency for the reads that get the metric often so high (several thousand uSecs) and why does it so closely track the latency of my normal reads? On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote: When you view OpsCenter metrics, you're generating a small number of reads to fetch the metric data, which is why your read count is near zero instead of actually being zero. Since reads are still occurring, Cassandra will continue to show a read latency. Basically, you're just viewing the latency on the reads to fetch metric data. Normally the number of reads required to view metrics are small enough that they only make a minor difference in your overall read latency average, but when you have no other reads occurring, they're the only reads that are included in the average. On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote: I am making heavy use of DataStax OpsCenter to help tune my system and its great. And yet puzzling. I see my clients do a burst of Reads causing the OpsCenter Read Requests chart to go up and stay up until the clients finish doing their reads. The read request latency chart also goes upbut it stays up even after all the reads are done. At last glance I've had next to zero reads for 10 minutes but still have a read request latency thats basically unchanged from when there were actual reads. How am I to interpret this? Thanks. Brian Tarbox -- Tyler Hobbs DataStax http://datastax.com/
trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests
We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the row cache on one node in my cluster to see if I could detect an effect of using it. The problem is that nodetool info says that even with a two gig row_cache we're getting zero requests. Since my client program is actively processing, and since keycache shows lots of activity I'm puzzled. Shouldn't any read of a column cause the entire row to be loaded? My entire data file is only 32 gig right now so its hard to imagine the 2 gig is too small to hold even a single row? Any suggestions how to proceed are appreciated. Thanks. Brian Tarbox
How can OpsCenter show me Read Request Latency where there are no read requests??
I am making heavy use of DataStax OpsCenter to help tune my system and its great. And yet puzzling. I see my clients do a burst of Reads causing the OpsCenter Read Requests chart to go up and stay up until the clients finish doing their reads. The read request latency chart also goes upbut it stays up even after all the reads are done. At last glance I've had next to zero reads for 10 minutes but still have a read request latency thats basically unchanged from when there were actual reads. How am I to interpret this? Thanks. Brian Tarbox
is there a way to list who is connected to my cluster?
I'd like to be able to find out which processes are connected to my clusteris there a way to do this? The root problem is that someone is doing an operation or set of operations that is causing DataStax to show high read latency. In trying to find out which of our various programs is doing this I've turned off all of the programs I know of. But, DataStax still saying there are clients running against us...how can I find them? Thanks. Brian Tarbox
puzzled why my cluster is slowing down
I have a 4 node cluster with lots JVM memory and lots of system memory that slows down when I'm doing lots of writes. Running DataStax charts I see my read and write latency rise from 50-100 u-secs to 1500-4500 u-secs. This is across a 12 hour data load during which time the applied load is high but fairly constant (500-700 writes/sec). I'm trying to understand the slowdown: there is no memory pressure, I've run every option under nodetool to look for bottlenecks (tpstats, compactionStats, etc) and see none. I'm running with keycache and have about 98% hits. What can I check next? Thanks! Brian Tarbox
help turning compaction..hours of run to get 0% compaction....
I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some point my performance falls off a cliff due to time spent doing compactions. I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced to 100% of 99% of the original. I'm trying to understand what direction this data points me to in term of configuration change. a) increase my compaction_throughput_mb_per_sec because I'm falling behind (am I falling behind?) b) enable multi-threaded compaction? Any help is appreciated. Brian
Re: help turning compaction..hours of run to get 0% compaction....
I have not specified leveled compaction so I guess I'm defaulting to size tiered? My data (in the column family causing the trouble) insert once, ready many, update-never. Brian On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman mkjell...@barracuda.comwrote: Size tiered or leveled compaction? From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 12:03 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: help turning compaction..hours of run to get 0% compaction I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some point my performance falls off a cliff due to time spent doing compactions. I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced to 100% of 99% of the original. I'm trying to understand what direction this data points me to in term of configuration change. a) increase my compaction_throughput_mb_per_sec because I'm falling behind (am I falling behind?) b) enable multi-threaded compaction? Any help is appreciated. Brian -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
Re: help turning compaction..hours of run to get 0% compaction....
The problem I see is that it already takes me more than 24 hours just to load my data...during which time the logs say I'm spending tons of time doing compaction. For example in the last 72 hours I'm consumed* 20 hours*per machine on compaction. Can I conclude from that than I should be (perhaps drastically) increasing my compaction_mb_per_sec on the theory that I'm getting behind? The fact that it takes me 3 days or more to run a test means its hard to just play with values and see what works best, so I'm trying to understand the behavior in detail. Thanks. Brain On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman mkjell...@barracuda.comwrote: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If you perform at least twice as many reads as you do writes, leveled compaction may actually save you disk I/O, despite consuming more I/O for compaction. This is especially true if your reads are fairly random and don’t focus on a single, hot dataset. From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 12:56 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: help turning compaction..hours of run to get 0% compaction I have not specified leveled compaction so I guess I'm defaulting to size tiered? My data (in the column family causing the trouble) insert once, ready many, update-never. Brian On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman mkjell...@barracuda.comwrote: Size tiered or leveled compaction? From: Brian Tarbox tar...@cabotresearch.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, January 7, 2013 12:03 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: help turning compaction..hours of run to get 0% compaction I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some point my performance falls off a cliff due to time spent doing compactions. I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced to 100% of 99% of the original. I'm trying to understand what direction this data points me to in term of configuration change. a) increase my compaction_throughput_mb_per_sec because I'm falling behind (am I falling behind?) b) enable multi-threaded compaction? Any help is appreciated. Brian -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
Re: Cassadra API for Java
Anyone still use Pelops? On Sun, Dec 30, 2012 at 12:19 PM, Shahryar Sedghi shsed...@gmail.comwrote: I use JDBC with Cassandra 1.1 with CQL 3. I tried both Hector and Thrift and JDBC is much easier to code, I never tried Astyanax. Application servers have built-in connection pooling support for JDBC, but do not provide fail over to other machines, you need to do it at the application level. Another Caveat: With Both Hector and Thrift without CQL you can retrieve all or portion of the row keys, CQL (at least on 1.1) does not give you distinct row keys. If you have a use case like this, either you need a Hybrid API solution or stick with another API, Regards Shahryar On Fri, Dec 28, 2012 at 10:34 PM, Michael Kjellman mkjell...@barracuda.com wrote: This was asked as recently as one month + 1 day btw: http://grokbase.com/t/cassandra/user/12bve4d8e8/java-high-level-client if you weren't subscribed to the group to see the messages to see a longer discussion. From: Baskar Sikkayan techba...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Friday, December 28, 2012 7:24 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Cassadra API for Java Hi, I am new to Apache Cassandra. Could you please suggest me good java API( Hector, thrift or .) for Cassandra? Thanks, Baskar.S +91 97394 76008 -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f -- Life is what happens while you are making other plans. ~ John Lennon
Re: State of Cassandra and Java 7
What I saw in all cases was a) set JAVA_HOME to java7, run program fail b) set JAVA_HOME to java6, run program success I should have better notes but I'm at a 6 person startup so working tools gets used and failing tools get deleted. Brian On Fri, Dec 21, 2012 at 3:54 PM, Bryan Talbot btal...@aeriagames.comwrote: Brian, did any of your issues with java 7 result in corrupting data in cassandra? We just ran into an issue after upgrading a test cluster from Cassandra 1.1.5 and Oracle JDK 1.6.0_29-b11 to Cassandra 1.1.7 and 7u10. What we saw is values in columns with validation Class=org.apache.cassandra.db.marshal.LongType that were proper integers becoming corrupted so that they become stored as strings. I don't have a reproducible test case yet but will work on making one over the holiday if I can. For example, a column with a long type that was originally written and stored properly (say with value 1200) was somehow changed during cassandra operations (compaction seems the only possibility) to be the value '1200' with quotes. The data was written using the phpcassa library and that application and library haven't been changed. This has only happened on our test cluster which was upgraded and hasn't happened on our live cluster which was not upgraded. Many of our column families were affected and all affected columns are Long (or bigint for cql3). Errors when reading using CQL3 command client look like this: Failed to decode value '1356441225' (for column 'expires') as bigint: unpack requires a string argument of length 8 and when reading with cassandra-cli the error is [default@cf] get token['fbc1e9f7cc2c0c2fa186138ed28e5f691613409c0bcff648c651ab1f79f9600b']; = (column=client_id, value=8ec4c29de726ad4db3f89a44cb07909c04f90932d, timestamp=1355836425784329, ttl=648000) A long is exactly 8 bytes: 10 -Bryan On Mon, Dec 17, 2012 at 7:33 AM, Brian Tarbox tar...@cabotresearch.comwrote: I was using jre-7u9-linux-x64 which was the latest at the time. I'll confess that I did not file any bugs...at the time the advice from both the Cassandra and Zookeeper lists was to stay away from Java 7 (and my boss had had enough of my reporting that *the problem was Java 7* for me to spend a lot more time getting the details). Brian On Sun, Dec 16, 2012 at 4:54 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Sat, Dec 15, 2012 at 7:12 PM, Michael Kjellman mkjell...@barracuda.com wrote: What issues have you ran into? Actually curious because we push 1.1.5-7 really hard and have no issues whatsoever. A related question is which which version of java 7 did you try? The first releases of java 7 were apparently famous for having many issues but it seems the more recent updates are much more stable. -- Sylvain On Dec 15, 2012, at 7:51 AM, Brian Tarbox tar...@cabotresearch.com wrote: We've reverted all machines back to Java 6 after running into numerous Java 7 issues...some running Cassandra, some running Zookeeper, others just general problems. I don't recall any other major language release being such a mess. On Fri, Dec 14, 2012 at 5:07 PM, Bill de hÓra b...@dehora.net wrote: At least that would be one way of defining officially supported. Not quite, because, Datastax is not Apache Cassandra. the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. Exactly right. If enough people use Cassandra on Java7 and enough people file bugs about Java 7 and enough people work on bugs for Java 7 then Cassandra will eventually work well enough on Java7. Bill On 14 Dec 2012, at 19:43, Drew Kutcharian d...@venarc.com wrote: In addition, the DataStax official documentation states: Versions earlier than 1.6.0_19 should not be used. Java 7 is not recommended. http://www.datastax.com/docs/1.1/install/install_rpm On Dec 14, 2012, at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: Does Datastax (or any other company) support Cassandra under Java 7? Or will they tell you to downgrade when you have some problem, because they don't support C* running on 7? At least that would be one way of defining officially supported. On Fri, Dec 14, 2012 at 2:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: What kind of official statement do you want? As far as I can be considered an official voice of the project, my statement is: various people run in production with Java 7 and it seems to work. Or to answer the initial question, the only issue related to Java 7 that I
Re: State of Cassandra and Java 7
I was using jre-7u9-linux-x64 which was the latest at the time. I'll confess that I did not file any bugs...at the time the advice from both the Cassandra and Zookeeper lists was to stay away from Java 7 (and my boss had had enough of my reporting that *the problem was Java 7* for me to spend a lot more time getting the details). Brian On Sun, Dec 16, 2012 at 4:54 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Sat, Dec 15, 2012 at 7:12 PM, Michael Kjellman mkjell...@barracuda.com wrote: What issues have you ran into? Actually curious because we push 1.1.5-7 really hard and have no issues whatsoever. A related question is which which version of java 7 did you try? The first releases of java 7 were apparently famous for having many issues but it seems the more recent updates are much more stable. -- Sylvain On Dec 15, 2012, at 7:51 AM, Brian Tarbox tar...@cabotresearch.com wrote: We've reverted all machines back to Java 6 after running into numerous Java 7 issues...some running Cassandra, some running Zookeeper, others just general problems. I don't recall any other major language release being such a mess. On Fri, Dec 14, 2012 at 5:07 PM, Bill de hÓra b...@dehora.net wrote: At least that would be one way of defining officially supported. Not quite, because, Datastax is not Apache Cassandra. the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. Exactly right. If enough people use Cassandra on Java7 and enough people file bugs about Java 7 and enough people work on bugs for Java 7 then Cassandra will eventually work well enough on Java7. Bill On 14 Dec 2012, at 19:43, Drew Kutcharian d...@venarc.com wrote: In addition, the DataStax official documentation states: Versions earlier than 1.6.0_19 should not be used. Java 7 is not recommended. http://www.datastax.com/docs/1.1/install/install_rpm On Dec 14, 2012, at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: Does Datastax (or any other company) support Cassandra under Java 7? Or will they tell you to downgrade when you have some problem, because they don't support C* running on 7? At least that would be one way of defining officially supported. On Fri, Dec 14, 2012 at 2:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: What kind of official statement do you want? As far as I can be considered an official voice of the project, my statement is: various people run in production with Java 7 and it seems to work. Or to answer the initial question, the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. We do obviously encourage people to try Java 7 as much as possible and report any problem they may run into, but I would have though this goes without saying. On Fri, Dec 14, 2012 at 4:05 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Dec 13, 2012 at 11:43 AM, Drew Kutcharian d...@venarc.com wrote: With Java 6 begin EOL-ed soon (https://blogs.oracle.com/java/entry/end_of_public_updates_for), what's the status of Cassandra's Java 7 support? Anyone using it in production? Any outstanding *known* issues? I'd love to see an official statement from the project, due to the sort of EOL issues you're referring to. Unfortunately previous requests on this list for such a statement have gone unanswered. The non-official response is that various people run in production with Java 7 and it seems to work. :) =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
Re: State of Cassandra and Java 7
We've reverted all machines back to Java 6 after running into numerous Java 7 issues...some running Cassandra, some running Zookeeper, others just general problems. I don't recall any other major language release being such a mess. On Fri, Dec 14, 2012 at 5:07 PM, Bill de hÓra b...@dehora.net wrote: At least that would be one way of defining officially supported. Not quite, because, Datastax is not Apache Cassandra. the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. Exactly right. If enough people use Cassandra on Java7 and enough people file bugs about Java 7 and enough people work on bugs for Java 7 then Cassandra will eventually work well enough on Java7. Bill On 14 Dec 2012, at 19:43, Drew Kutcharian d...@venarc.com wrote: In addition, the DataStax official documentation states: Versions earlier than 1.6.0_19 should not be used. Java 7 is not recommended. http://www.datastax.com/docs/1.1/install/install_rpm On Dec 14, 2012, at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: Does Datastax (or any other company) support Cassandra under Java 7? Or will they tell you to downgrade when you have some problem, because they don't support C* running on 7? At least that would be one way of defining officially supported. On Fri, Dec 14, 2012 at 2:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: What kind of official statement do you want? As far as I can be considered an official voice of the project, my statement is: various people run in production with Java 7 and it seems to work. Or to answer the initial question, the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. We do obviously encourage people to try Java 7 as much as possible and report any problem they may run into, but I would have though this goes without saying. On Fri, Dec 14, 2012 at 4:05 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Dec 13, 2012 at 11:43 AM, Drew Kutcharian d...@venarc.com wrote: With Java 6 begin EOL-ed soon (https://blogs.oracle.com/java/entry/end_of_public_updates_for), what's the status of Cassandra's Java 7 support? Anyone using it in production? Any outstanding *known* issues? I'd love to see an official statement from the project, due to the sort of EOL issues you're referring to. Unfortunately previous requests on this list for such a statement have gone unanswered. The non-official response is that various people run in production with Java 7 and it seems to work. :) =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Memory Manager
Can you supply your java parameters? On Mon, Nov 12, 2012 at 7:29 AM, Everton Lima peitin.inu...@gmail.comwrote: Hi people, I was using cassandra on distributed project. I am using java 6 and cassandra 1.1.6. My problem is in Memory manager (I think). My system was throwing heap limit exception. The problem is that after some inserts (2Gb) the Old Gen memory of heap full and can not be cleaned. This problem, only occurs when I use more than one machine, with only one machine, after the insert the GC clean the Old Gen. Some one can help me? Thanks! -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás
problem encrypting keys and data
We have a requirement to store our data encrypted. Our encryption system turns our various strings into byte arrays. So far so good. The problem is that the bytes in our byte arrays are sometimes negative...but when we look at them in the cassandra-cli (or try to programatically retrieve them) the bytes are all positive so we of course don't find the expected data. We have tried Byte encoding and UTF8 encoding without luck. In looking at the Byte validator in particular I see nothing that ought to care about the sign of the bytes, but perhaps I'm missing something. Any suggestions would be appreciated, thanks. Brian Tarbox
my ubunutu system has fuseblk filesystem, would it go faster if I changed it to EXT4?
I got some new ubuntu servers to add to my cluster and found that the file system is fuseblk which really means NTFS. All else being equal would I expect to get any performance boost if I converted the file system to EXT4? Edward Capriolo's Cookbook book seems to suggest so. Thanks. Brian Tarbox
why does my Effective-Ownership and Load from ring give such different answers?
I had a two node cluster that I expanded to four nodes. I ran the token generation script and moved all the nodes so that when I run nodetool ring each node reports 25% Effective-Ownership. However, my load numbers map out to 39%, 30%, 15%, 17%. How can that be? Thanks.
read performance plumetted
I have a two node cluster hosting a 45 gig dataset. I periodically have to read a high fraction (20% or so) of my 'rows', grabbing a few thousand at a time and then processing them. This used to result in about 300-500 reads a second which seemed quite good. Recently that number has plummeted to 20-50 reads a second. The obvious question is what did I change? I certainly added more databringing my total load from 38 or so gig to 45 or so gig but its hard to imagine that causing this problem. The shape of my data has not changed and I haven't changed any cassandra configuration. Running nodetool tpstats I'm for the first time ever seeing entries under ReadStage Active and Pending which correlates with slow reads. Running iostat I'm seeing a significant (10-50%) of iowait where I previously never saw higher than 1-2% I ran a full compaction on the relevant CF (which took 3.5 hours) to no avail. Any suggestions on where I can look next? Thanks.
can I have a mix of 32 and 64 bit machines in a cluster?
I can't imagine why this would be a problem but I wonder if anyone has experience with running a mix of 32 and 64 bit nodes in a cluster. (I'm not going to do this in production, just trying to make use of the gear I have for my local system). Thanks.