RE: Exception when setting tokens for the cassandra nodes
Oh.my bad. Thanks mate, that worked. On Apr 29, 2013 10:03 PM, moshe.kr...@barclays.com wrote: For starters: If you are using the Murmur3 partitioner, which is the default in cassandra.yaml, then you need to calculate the tokens using:*** * python -c 'print [str(((2**64 / 2) * i) - 2**63) for i in range(2)]' ** ** which gives the following values: ['-9223372036854775808', '0'] ** ** *From:* Rahul [mailto:rahule...@gmail.com] *Sent:* Monday, April 29, 2013 7:23 PM *To:* user@cassandra.apache.org *Subject:* Exception when setting tokens for the cassandra nodes ** ** Hi, I am testing out Cassandra 1.2 on two of my local servers. But I face problems with assigning tokens to my nodes. When I use nodetool to set token, I end up getting an java Exception. My test setup is as follows, Node1: local ip 1 (seed) Node2: local ip 2 (seed) ** ** Since I have two nodes, i calculated the tokens as 0 and 2^127/2 = 85070591730234615865843651857942052864. I was able to set token 0 for my first node using nodetool move 0 , but when i am trying to set 85070591730234615865843651857942052864 for my second node, it throws a main UndeclaredThrowableException. Full stack is attached bellow. ** ** user@server~$ nodetool move 85070591730234615865843651857942052864 ** ** Exception in thread main java.lang.reflect.UndeclaredThrowableException* *** at $Proxy0.getTokenToEndpointMap(Unknown Source) at org.apache.cassandra.tools.NodeProbe.getTokenToEndpointMap(NodeProbe.java:288) at org.apache.cassandra.tools.NodeCmd.printRing(NodeCmd.java:215) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1051)* *** Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.db:type=StorageService at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:668) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1463) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:656) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method)* *** at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)** ** at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:273) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:251)* *** at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:160) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:901) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:280) ** ** ** ** Any suggestions towards solving this problem would be deeply appreciated.* *** thanks, rahul
nodetool status OWNS and multiple DCs
Hello. I have set up test cluster of 2 DCs with 1 node in each DC. In each config I specified 256 virtual nodes and chosed GossippingPropertyFileSnitch. For node1: ~/Cassandra$ cat /etc/cassandra/cassandra-rackdc.properties dc=DC1 rack=RAC1 For node2: ~/Cassandra$ cat /etc/cassandra/cassandra-rackdc.properties dc=DC2 rack=RAC1 When I call nodetool status, it shows not 100% ownership of tokens for each DC: :~/Cassandra$ nodetool status Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.16.0.1 97.43 KB 256 51,5% 7ffd2432-46c9-443c-aa0c-8bfd960d2acc RAC1 Datacenter: DC2 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.16.0.2 91.78 KB 256 48,5% 93d3bdbf-8625-4e54-8c7c-087fbfe419f5 RAC1 I thought that each datacenter has 100% coverage of token range. What does the value in Owns field mean and how it affects a replication (for exaple, with replication factors DC1:1, DC2:2)? Thanks in advance, Sergey Naumov.
Exporting all data within a keyspace
Is there any easy way of exporting all data for a keyspace (and conversely) importing it. Regards Chiddu
Re: Exporting all data within a keyspace
Try sstable2json and json2sstable. But it works on column family so you can fetch all column family and iterate over list of CF and use sstable2json tool to extract data. Remember this will only fetch on disk data do anything in memtable/cache which is to be flushed will be missed. So run compaction and then run the written script. On Tuesday, April 30, 2013, Chidambaran Subramanian wrote: Is there any easy way of exporting all data for a keyspace (and conversely) importing it. Regards Chiddu
Re: Exporting all data within a keyspace
You could always do something like this as well: http://brianoneill.blogspot.com/2012/05/dumping-data-from-cassandra-like.htm l -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 http://www.twitter.com/boneill42 healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Kumar Ranjan winnerd...@gmail.com Reply-To: user@cassandra.apache.org Date: Tuesday, April 30, 2013 9:11 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Exporting all data within a keyspace Try sstable2json and json2sstable. But it works on column family so you can fetch all column family and iterate over list of CF and use sstable2json tool to extract data. Remember this will only fetch on disk data do anything in memtable/cache which is to be flushed will be missed. So run compaction and then run the written script. On Tuesday, April 30, 2013, Chidambaran Subramanian wrote: Is there any easy way of exporting all data for a keyspace (and conversely) importing it. Regards Chiddu
Re: normal thread counts?
I use phpcassa. I did a thread dump. 99% of the threads look very similar (I'm using 1.1.9 in terms of matching source lines). The thread names are all like this: WRITE-/10.x.y.z. There are a LOT of duplicates (in terms of the same IP). Many many many of the threads are trying to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts). The stack trace is basically the same for them all, attached at the bottom. There is a lot of things I could talk about in terms of my situation, but what I think might be pertinent to this thread: I hit a tipping point recently and upgraded a 9 node cluster from AWS m1.large to m1.xlarge (rolling, one at a time). 7 of the 9 upgraded fine and work great. 2 of the 9 keep struggling. I've replaced them many times now, each time using this process: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node And even this morning the only two nodes with a high number of threads are those two (yet again). And at some point they'll OOM. Seems like there is something about my cluster (caused by the recent upgrade?) that causes a thread leak on OutboundTcpConnection But I don't know how to escape from the trap. Any ideas? stackTrace = [ { className = sun.misc.Unsafe; fileName = Unsafe.java; lineNumber = -2; methodName = park; nativeMethod = true; }, { className = java.util.concurrent.locks.LockSupport; fileName = LockSupport.java; lineNumber = 158; methodName = park; nativeMethod = false; }, { className = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject; fileName = AbstractQueuedSynchronizer.java; lineNumber = 1987; methodName = await; nativeMethod = false; }, { className = java.util.concurrent.LinkedBlockingQueue; fileName = LinkedBlockingQueue.java; lineNumber = 399; methodName = take; nativeMethod = false; }, { className = org.apache.cassandra.net.OutboundTcpConnection; fileName = OutboundTcpConnection.java; lineNumber = 104; methodName = run; nativeMethod = false; } ]; -- On Mon, Apr 29, 2013 at 4:31 PM, aaron morton aa...@thelastpickle.comwrote: I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. That does not sound too good. My first guess would be lots of client connections. What client are you using, does it do connection pooling ? See the comments in cassandra.yaml around rpc_server_type, the default uses sync uses one thread per connection, you may be better with HSHA. But if your app is leaking connection you should probably deal with that first. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 3:07 AM, William Oberman ober...@civicscience.com wrote: Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runtime. I've checked and my system settings seem acceptable: memlock=unlimited nofiles=10 nproc=122944 I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), and I keep OOM'ing with the above error. I've found some (what seem to me) to be obscure references to the stack size interacting with # of threads. If I'm understanding it correctly, to reason about Java mem usage I have to think of OS + Heap as being locked down, and the stack gets the leftovers of physical memory and each thread gets a stack. For me, the system ulimit setting on stack is 10240k (no idea if java sees or respects this setting). My -Xss for cassandra is the default (I hope, don't remember messing with it) of 180k. I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. Is that a normal thread count? Could my OOM be related to stack + number of threads, or am I overlooking something more simple? will
SSTables not opened on new cluste
Hello, I'm trying to bring up a copy of an existing 3-node cluster running 1.0.8 into a 3-node cluster running 1.0.11. The new cluster has been configured to have the same tokens and the same partitioner. Initially, I copied the files in the data directory of each node into their corresponding node on the new cluster. When starting the new cluster, it didn't pick up the KS CF. So I remove the directories, created the schema, stopped cassandra and copied the data back. None of the SSTables are opened at startup. When I nodetool refresh them, it says No new SSTables where found for XXX/YYY where X is my KS and Y is my CF I'm totally stumped: any ideas as to why this is happening ? I've checked that there are actual files there and that the permissions are correct. Thanks.
How does a healthy node look like?
Hi, I have troubles finding some quantitative information as to how a healthy Cassandra node should look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware spec and read/write load. I have troubles gauging our first and only Cassandra node, whether it needs tuning or is simply overloaded. If anyone could point me to some data that would be very helpful. (So far I have run the node with the default settings in cassandra.yaml and cassandra-env. The log claims that the server is occasionally under memory pressure and I get frequent timeouts for writes. I see what I think are many flushes, compactions and GCs in the log. Some toying with heap and new gen sizes, key cache, and the compaction throughput settings did not improve the overall situation much.) Thanks! Ralf
Re: error casandra ring an hadoop connection ¿?
ava.lang.RuntimeException: UnavailableException() Looks like the pig script could talk to one node, but the coordinator could not process the request at the consistency level requested. Check all the nodes are up, that the RF is set to the correct value and the CL you are using. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 4:55 AM, Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com wrote: hi all: i can run pig with cassandra and hadoop in EC2. I ,m trying to run pig with cassandra ring and hadoop The ring cassandra have the tasktrackers and datanodes , too. and i running pig from another machine where i have intalled the namenode-jobtracker. ihave a simple script to load data ffrom pygmalion keyspace adn columfalimily account and dump result to test. I installed another simple local cassandra in namenode-job tacker machine and i can run pig jobs ok, but when i try to run script in cassandra ring config changig the config of envitronment variable PIG_INITIAL_ADDRESS to the IP of one of the nodes of cassandra ring i have this error: --- j ava.lang.RuntimeException: UnavailableException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(CassandraStorage.java:226) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12924) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) ... 17 more can anybody help me o have any idea? Thanks in advance pd: 1.- the ports are open in EC2 2 The keyspace and cF are created in the cassandra cluster EC2 too nad likey at the name node cassandra installation. 3.-i have this bash_profile configuration: # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/.local/bin:$HOME/bin export PATH=$PATH:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 export CASSANDRA_HOME=/home/ec2-user/apache-cassandra-1.2.4 export PIG_HOME=/home/ec2-user/pig-0.11.1-src export PIG_INITIAL_ADDRESS=10.210.164.233 #export PIG_INITIAL_ADDRESS=127.0.0.1 export PIG_RPC_PORT=9160 export PIG_CONF_DIR=/home/ec2-user/hadoop-1.1.1/conf export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner #export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner 4.- I export all cassandrasjars in the hadoop-env.sh for all nodes of hadoop 5.- i have the same error running PIG in local mode 6.- if i change to ramdonpartioner an reload changes i have this error: java.lang.RuntimeException: InvalidRequestException(why:Start token sorts after end token) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at
Re: Compaction, Slow Ring, and bad behavior
Check the logs for warnings from the GCInspector. If you see messages that correlate with compaction running limit compaction to help stabilise things… * set concurrrent_compactions to 2 * if you have wide rows reduce in_memory_compaction_limit * reduce compaction_throughput If you have a lot (more than 200 million) of rows check the size of the bloom filters using nodetool cfstats. If it's around 1GB consider increase the bloom_filter_fp_chance per CF to 0.01 or 0.1 I've tried changing the amount of RAM between 8G and 12G, More JVM memory is not always the answer, try to get back to stable on the the defaults or something close to them and then tune from there. sometimes gets stuck on a compaction with near-idle disk throughput Wide rows can slow down compaction, check the row size with nodetool cfstats or nodetool cfhistograms Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 5:33 AM, Drew from Zhrodague drewzhroda...@zhrodague.net wrote: Hi, we have a 9-node ring on m1.xlarge AWS hosts. We started having some trouble a while ago, and it's making me pull out all of my hair. The host in position #3 has been replaced 4 times. Each time, the host joins the ring, I do a nodetool repair -pr, and she seems fine for about a day. Then she gets real slow, sometimes OOMs, sometimes takes down the host in position #5, sometimes gets stuck on a compaction with near-idle disk throughput, and eventually dies without any kind of error message or reason for failing. Sometimes our cluster gets so slow that it is almost unusable - we get timeout errors from our application, AWS sends us voluminous alerts about latency. I've tried changing the amount of RAM between 8G and 12G, changing the MAX_HEAP_SIZE and HEAP_NEWSIZE, repeatedly forcing a stop compaction, setting astronomical ulimit values, and praying to available gods. I'm a bit confused. We're not using super-wide rows, most things are default. EL5, Cassandra 1.1.9, Java 1.6.0 -- Drew from Zhrodague lolcat divinator d...@zhrodague.net
Re: Really odd issue (AWS related?)
We've also had issues with ephemeral drives in a single AZ in us-east-1, so much so that we no longer use that AZ. Though our issues tended to be obvious from instance boot - they wouldn't suddenly degrade. On Apr 28, 2013, at 2:27 PM, Alex Major wrote: Hi Mike, We had issues with the ephemeral drives when we first got started, although we never got to the bottom of it so I can't help much with troubleshooting unfortunately. Contrary to a lot of the comments on the mailing list we've actually had a lot more success with EBS drives (PIOPs!). I'd definitely suggest try striping 4 EBS drives (Raid 0) and using PIOPs. You could be having a noisy neighbour problem, I don't believe that m1.large or m1.xlarge instances get all of the actual hardware, virtualisation on EC2 still sucks in isolating resources. We've also had more success with Ubuntu on EC2, not so much with our Cassandra nodes but some of our other services didn't run as well on Amazon Linux AMIs. Alex On Sun, Apr 28, 2013 at 7:12 PM, Michael Theroux mthero...@yahoo.com wrote: I forgot to mention, When things go really bad, I'm seeing I/O waits in the 80-95% range. I restarted cassandra once when a node is in this situation, and it took 45 minutes to start (primarily reading SSTables). Typically, a node would start in about 5 minutes. Thanks, -Mike On Apr 28, 2013, at 12:37 PM, Michael Theroux wrote: Hello, We've done some additional monitoring, and I think we have more information. We've been collecting vmstat information every minute, attempting to catch a node with issues,. So, it appears, that the cassandra node runs fine. Then suddenly, without any correlation to any event that I can identify, the I/O wait time goes way up, and stays up indefinitely. Even non-cassandra I/O activities (such as snapshots and backups) start causing large I/O Wait times when they typically would not. Previous to an issue, we would typically see I/O wait times 3-4% with very few blocked processes on I/O. Once this issue manifests itself, i/O wait times for the same activities jump to 30-40% with many blocked processes. The I/O wait times do go back down when there is literally no activity. - Updating the node to the latest Amazon Linux patches and rebooting the instance doesn't correct the issue. - Backing up the node, and replacing the instance does correct the issue. I/O wait times return to normal. One relatively recent change we've made is we upgraded to m1.xlarge instances which has 4 ephemeral drives available. We create a logical volume from the 4 drives with the idea that we should be able to get increased I/O throughput. When we ran m1.large instances, we had the same setup, although it was only using 2 ephemeral drives. We chose to use LVM, vs. madm because we were having issues having madm create the raid volume reliably on restart (and research showed that this was a common problem). LVM just worked (and had worked for months before this upgrade).. For reference, this is the script we used to create the logical volume: vgcreate mnt_vg /dev/sdb /dev/sdc /dev/sdd /dev/sde lvcreate -L 1600G -n mnt_lv -i 4 mnt_vg -I 256K blockdev --setra 65536 /dev/mnt_vg/mnt_lv sleep 2 mkfs.xfs /dev/mnt_vg/mnt_lv sleep 3 mkdir -p /data mount -t xfs -o noatime /dev/mnt_vg/mnt_lv /data sleep 3 Another tidbit... thus far (and this maybe only a coincidence), we've only had to replace DB nodes within a single availability zone within us-east. Other availability zones, in the same region, have yet to show an issue. It looks like I'm going to need to replace a third DB node today. Any advice would be appreciated. Thanks, -Mike On Apr 26, 2013, at 10:14 AM, Michael Theroux wrote: Thanks. We weren't monitoring this value when the issue occurred, and this particular issue has not appeared for a couple of days (knock on wood). Will keep an eye out though, -Mike On Apr 26, 2013, at 5:32 AM, Jason Wee wrote: top command? st : time stolen from this vm by the hypervisor jason On Fri, Apr 26, 2013 at 9:54 AM, Michael Theroux mthero...@yahoo.com wrote: Sorry, Not sure what CPU steal is :) I have AWS console with detailed monitoring enabled... things seem to track close to the minute, so I can see the CPU load go to 0... then jump at about the minute Cassandra reports the dropped messages, -Mike On Apr 25, 2013, at 9:50 PM, aaron morton wrote: The messages appear right after the node wakes up. Are you tracking CPU steal ? - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/04/2013, at 4:15 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Apr 24, 2013 at 5:03 AM, Michael Theroux mthero...@yahoo.com wrote: Another related question. Once we see messages being dropped on one node, our
Re: Exporting all data within a keyspace
Thanks guys,both are good pointers Regards Chiddu On Tue, Apr 30, 2013 at 7:09 PM, Brian O'Neill b...@alumni.brown.eduwrote: You could always do something like this as well: http://brianoneill.blogspot.com/2012/05/dumping-data-from-cassandra-like.html -brian --- Brian O'Neill Lead Architect, Software Development *Health Market Science* *The Science of Better Results* 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 http://www.twitter.com/boneill42 • healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. ** ** From: Kumar Ranjan winnerd...@gmail.com Reply-To: user@cassandra.apache.org Date: Tuesday, April 30, 2013 9:11 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Exporting all data within a keyspace Try sstable2json and json2sstable. But it works on column family so you can fetch all column family and iterate over list of CF and use sstable2json tool to extract data. Remember this will only fetch on disk data do anything in memtable/cache which is to be flushed will be missed. So run compaction and then run the written script. On Tuesday, April 30, 2013, Chidambaran Subramanian wrote: Is there any easy way of exporting all data for a keyspace (and conversely) importing it. Regards Chiddu
CfP 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)
we apologize if you receive multiple copies of this message === CALL FOR PAPERS 2013 Workshop on Middleware for HPC and Big Data Systems MHPC '13 as part of Euro-Par 2013, Aachen, Germany === Date: August 27, 2012 Workshop URL: http://m-hpc.org Springer LNCS SUBMISSION DEADLINE: May 31, 2013 - LNCS Full paper submission (rolling abstract submission) June 28, 2013 - Lightning Talk abstracts SCOPE Extremely large, diverse, and complex data sets are generated from scientific applications, the Internet, social media and other applications. Data may be physically distributed and shared by an ever larger community. Collecting, aggregating, storing and analyzing large data volumes presents major challenges. Processing such amounts of data efficiently has been an issue to scientific discovery and technological advancement. In addition, making the data accessible, understandable and interoperable includes unsolved problems. Novel middleware architectures, algorithms, and application development frameworks are required. In this workshop we are particularly interested in original work at the intersection of HPC and Big Data with regard to middleware handling and optimizations. Scope is existing and proposed middleware for HPC and big data, including analytics libraries and frameworks. The goal of this workshop is to bring together software architects, middleware and framework developers, data-intensive application developers as well as users from the scientific and engineering community to exchange their experience in processing large datasets and to report their scientific achievement and innovative ideas. The workshop also offers a dedicated forum for these researchers to access the state of the art, to discuss problems and requirements, to identify gaps in current and planned designs, and to collaborate in strategies for scalable data-intensive computing. The workshop will be one day in length, composed of 20 min paper presentations, each followed by 10 min discussion sections. Presentations may be accompanied by interactive demonstrations. TOPICS Topics of interest include, but are not limited to: - Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive, Pig, Sqoop, HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack - Data intensive middleware architecture - Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab - NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase - Schedulers including Cascading - Middleware for optimized data locality/in-place data processing - Data handling middleware for deployment in virtualized HPC environments - Parallelization and distributed processing architectures at the middleware level - Integration with cloud middleware and application servers - Runtime environments and system level support for data-intensive computing - Skeletons and patterns - Checkpointing - Programming models and languages - Big Data ETL - Stream processing middleware - In-memory databases for HPC - Scalability and interoperability - Large-scale data storage and distributed file systems - Content-centric addressing and networking - Execution engines, languages and environments including CIEL/Skywriting - Performance analysis, evaluation of data-intensive middleware - In-depth analysis and performance optimizations in existing data-handling middleware, focusing on indexing/fast storing or retrieval between compute and storage nodes - Highly scalable middleware optimized for minimum communication - Use cases and experience for popular Big Data middleware - Middleware security, privacy and trust architectures DATES Papers: Rolling abstract submission May 31, 2013 - Full paper submission July 8, 2013 - Acceptance notification October 3, 2013 - Camera-ready version due Lightning Talks: June 28, 2013 - Deadline for lightning talk abstracts July 15, 2013 - Lightning talk notification August 27, 2013 - Workshop Date TPC CHAIR Michael Alexander (chair), TU Wien, Austria Anastassios Nanos (co-chair), NTUA, Greece Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany Lizhe Wang (co-chair), Chinese Academy of Sciences, China Gianluigi Zanetti (co-chair), CRS4, Italy PROGRAM COMMITTEE Amitanand Aiyer, Facebook, USA Costas Bekas, IBM, Switzerland Jakob Blomer, CERN, Switzerland William Gardner, University of Guelph, Canada José Gracia, HPC Center of the University of Stuttgart, Germany Zhenghua Guom, Indiana University, USA Marcus Hardt, Karlsruhe Institute of Technology, Germany Sverre Jarp, CERN, Switzerland Christopher Jung, Karlsruhe Institute of Technology, Germany Andreas Knüpfer - Technische Universität Dresden, Germany Nectarios Koziris, National Technical University of Athens, Greece Yan Ma, Chinese Academy of Sciences, China Martin Schulz - Lawrence Livermore National Laboratory Viral Shah,
Re: cassandra-shuffle time to completion and required disk space
These are taken just before starting shuffle (ran repair/cleanup the day before). During shuffle disabled all reads/writes to the cluster. nodetool status keyspace: Load Tokens Owns (effective) Host ID 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 I'm a little confused when nodetool status was showing 256 tokens before the shuffle was run. Did you set num_tokens during the upgrade process ? But I doubt that would change anything as the inital_token is set in the system tables during bootstrap. `bin/cassandra-shuffle ls` will show the list of moves the shuffle process was/is going to run. What does that say? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 5:08 AM, John Watson j...@disqus.com wrote: That's what we tried first before the shuffle. And ran into the space issue. That's detailed in another thread title: Adding nodes in 1.2 with vnodes requires huge disks On Mon, Apr 29, 2013 at 4:08 AM, Sam Overton s...@acunu.com wrote: An alternative to running shuffle is to do a rolling bootstrap/decommission. You would set num_tokens on the existing hosts (and restart them) so that they split their ranges, then bootstrap in N new hosts, then decommission the old ones. On 28 April 2013 22:21, John Watson j...@disqus.com wrote: The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made). Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance Shuffling will entail moving a lot of data around the cluster and so has the potential to consume a lot of disk and network I/O, and to take a considerable amount of time. For this to be an online operation, the shuffle will need to operate on a lower priority basis to other streaming operations, and should be expected to take days or weeks to complete. We tried running shuffle on a QA version of our cluster and 2 things were brought to light: - Even with no reads/writes it was going to take 20 days - Each machine needed enough free diskspace to potentially hold the entire cluster's sstables on disk Regards, John -- Sam Overton Acunu | http://www.acunu.com | @acunu
Re: nodetool status OWNS and multiple DCs
I thought that each datacenter has 100% coverage of token range. What does the value in Owns field mean and how it affects a replication (for exaple, with replication factors DC1:1, DC2:2)? Run the command and specify your keyspace, that will tell nodetool to use the Replication Strategy specified for KS when calculating the layout. If it's NTS you should see what you expect. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 8:50 PM, Sergey Naumov sknau...@gmail.com wrote: Hello. I have set up test cluster of 2 DCs with 1 node in each DC. In each config I specified 256 virtual nodes and chosed GossippingPropertyFileSnitch. For node1: ~/Cassandra$ cat /etc/cassandra/cassandra-rackdc.properties dc=DC1 rack=RAC1 For node2: ~/Cassandra$ cat /etc/cassandra/cassandra-rackdc.properties dc=DC2 rack=RAC1 When I call nodetool status, it shows not 100% ownership of tokens for each DC: :~/Cassandra$ nodetool status Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.16.0.1 97.43 KB 256 51,5% 7ffd2432-46c9-443c-aa0c-8bfd960d2acc RAC1 Datacenter: DC2 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.16.0.2 91.78 KB 256 48,5% 93d3bdbf-8625-4e54-8c7c-087fbfe419f5 RAC1 I thought that each datacenter has 100% coverage of token range. What does the value in Owns field mean and how it affects a replication (for exaple, with replication factors DC1:1, DC2:2)? Thanks in advance, Sergey Naumov.
Re: normal thread counts?
Many many many of the threads are trying to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts). Are these IP's from before the upgrade ? Are they IP's you expect to see ? Cross reference them with the output from nodetool gossipinfo to see why the node thinks they should be used. Could you provide a list of the thread names ? One way to remove those IPs that may be to rolling restart with -Dcassandra.load_ring_state=false i the JVM opts at the bottom of cassandra-env.sh The OutboundTcpConnection threads are created in pairs by the OutboundTcpConnectionPool, which is created here https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L502 The threads are created in the OutboundTcpConnectionPool constructor checking to see if this could be the source of the leak. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 1/05/2013, at 2:18 AM, William Oberman ober...@civicscience.com wrote: I use phpcassa. I did a thread dump. 99% of the threads look very similar (I'm using 1.1.9 in terms of matching source lines). The thread names are all like this: WRITE-/10.x.y.z. There are a LOT of duplicates (in terms of the same IP). Many many many of the threads are trying to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts). The stack trace is basically the same for them all, attached at the bottom. There is a lot of things I could talk about in terms of my situation, but what I think might be pertinent to this thread: I hit a tipping point recently and upgraded a 9 node cluster from AWS m1.large to m1.xlarge (rolling, one at a time). 7 of the 9 upgraded fine and work great. 2 of the 9 keep struggling. I've replaced them many times now, each time using this process: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node And even this morning the only two nodes with a high number of threads are those two (yet again). And at some point they'll OOM. Seems like there is something about my cluster (caused by the recent upgrade?) that causes a thread leak on OutboundTcpConnection But I don't know how to escape from the trap. Any ideas? stackTrace = [ { className = sun.misc.Unsafe; fileName = Unsafe.java; lineNumber = -2; methodName = park; nativeMethod = true; }, { className = java.util.concurrent.locks.LockSupport; fileName = LockSupport.java; lineNumber = 158; methodName = park; nativeMethod = false; }, { className = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject; fileName = AbstractQueuedSynchronizer.java; lineNumber = 1987; methodName = await; nativeMethod = false; }, { className = java.util.concurrent.LinkedBlockingQueue; fileName = LinkedBlockingQueue.java; lineNumber = 399; methodName = take; nativeMethod = false; }, { className = org.apache.cassandra.net.OutboundTcpConnection; fileName = OutboundTcpConnection.java; lineNumber = 104; methodName = run; nativeMethod = false; } ]; -- On Mon, Apr 29, 2013 at 4:31 PM, aaron morton aa...@thelastpickle.com wrote: I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. That does not sound too good. My first guess would be lots of client connections. What client are you using, does it do connection pooling ? See the comments in cassandra.yaml around rpc_server_type, the default uses sync uses one thread per connection, you may be better with HSHA. But if your app is leaking connection you should probably deal with that first. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 3:07 AM, William Oberman ober...@civicscience.com wrote: Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runtime. I've checked and my system settings seem acceptable: memlock=unlimited nofiles=10 nproc=122944 I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), and I keep OOM'ing with the above error. I've found some (what seem to me) to be obscure references to the stack size interacting with # of threads. If I'm understanding it correctly, to reason about Java mem usage I have to think of OS + Heap as being locked down, and the stack gets the leftovers of physical memory and each thread gets a stack. For me, the system ulimit setting on stack is 10240k (no
Re: SSTables not opened on new cluste
Double check the file permissions ? Write some data (using cqlsh or cassandra-cli) and flush to make sure the new files are created where you expect them to be. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 1/05/2013, at 4:15 AM, Philippe watche...@gmail.com wrote: Hello, I'm trying to bring up a copy of an existing 3-node cluster running 1.0.8 into a 3-node cluster running 1.0.11. The new cluster has been configured to have the same tokens and the same partitioner. Initially, I copied the files in the data directory of each node into their corresponding node on the new cluster. When starting the new cluster, it didn't pick up the KS CF. So I remove the directories, created the schema, stopped cassandra and copied the data back. None of the SSTables are opened at startup. When I nodetool refresh them, it says No new SSTables where found for XXX/YYY where X is my KS and Y is my CF I'm totally stumped: any ideas as to why this is happening ? I've checked that there are actual files there and that the permissions are correct. Thanks.
Re: SSTables not opened on new cluste
Hi Aaron, thanks for the response. Permissions are correct : owner is cassandra (ubuntu) and permissions are drwxr-xr-x When I created the schema, the KS were created as directory in the .../data/ directory When I use cassandra-cli, set the CL to QUORUM, ensure two instances are up (nodetool ring) and set a bogus column, I get an Unavailable exception which is weird. I just noticed that when the new cluster nodes start up, auto_bootstrap is set to true. Given that it's no longer in the YAML file, I didn't set it to false; would that explain it ? Once I removed it and set logging to TRACE, I get these relevant log lines SliceQueryFilter.java (line 123) collecting 1 of 2147483647: XXX:5123@1367333147034 a bunch of Schema.java (line 382) Adding org.apache.cassandra.config.CFMetaData@45f8acdc [cfId=1009,ksName=XXX,cfName=YYY... Removing compacted SSTable files from XXX No bootstrapping, leaving or moving nodes - empty pending ranges for XXX (my KS) Table.java (line 317) Initializing XXX.YYY (KS and CF) Any ideas ? 2013/4/30 aaron morton aa...@thelastpickle.com Double check the file permissions ? Write some data (using cqlsh or cassandra-cli) and flush to make sure the new files are created where you expect them to be. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 1/05/2013, at 4:15 AM, Philippe watche...@gmail.com wrote: Hello, I'm trying to bring up a copy of an existing 3-node cluster running 1.0.8 into a 3-node cluster running 1.0.11. The new cluster has been configured to have the same tokens and the same partitioner. Initially, I copied the files in the data directory of each node into their corresponding node on the new cluster. When starting the new cluster, it didn't pick up the KS CF. So I remove the directories, created the schema, stopped cassandra and copied the data back. None of the SSTables are opened at startup. When I nodetool refresh them, it says No new SSTables where found for XXX/YYY where X is my KS and Y is my CF I'm totally stumped: any ideas as to why this is happening ? I've checked that there are actual files there and that the permissions are correct. Thanks.
Re: normal thread counts?
The issue below could result in abandoned threads under high contention, so we'll get that fixed. But we are not sure how/why it would be called so many times. If you could provide a full list of threads and the output from nodetool gossipinfo that would help. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 1/05/2013, at 8:34 AM, aaron morton aa...@thelastpickle.com wrote: Many many many of the threads are trying to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts). Are these IP's from before the upgrade ? Are they IP's you expect to see ? Cross reference them with the output from nodetool gossipinfo to see why the node thinks they should be used. Could you provide a list of the thread names ? One way to remove those IPs that may be to rolling restart with -Dcassandra.load_ring_state=false i the JVM opts at the bottom of cassandra-env.sh The OutboundTcpConnection threads are created in pairs by the OutboundTcpConnectionPool, which is created here https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L502 The threads are created in the OutboundTcpConnectionPool constructor checking to see if this could be the source of the leak. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 1/05/2013, at 2:18 AM, William Oberman ober...@civicscience.com wrote: I use phpcassa. I did a thread dump. 99% of the threads look very similar (I'm using 1.1.9 in terms of matching source lines). The thread names are all like this: WRITE-/10.x.y.z. There are a LOT of duplicates (in terms of the same IP). Many many many of the threads are trying to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts). The stack trace is basically the same for them all, attached at the bottom. There is a lot of things I could talk about in terms of my situation, but what I think might be pertinent to this thread: I hit a tipping point recently and upgraded a 9 node cluster from AWS m1.large to m1.xlarge (rolling, one at a time). 7 of the 9 upgraded fine and work great. 2 of the 9 keep struggling. I've replaced them many times now, each time using this process: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node And even this morning the only two nodes with a high number of threads are those two (yet again). And at some point they'll OOM. Seems like there is something about my cluster (caused by the recent upgrade?) that causes a thread leak on OutboundTcpConnection But I don't know how to escape from the trap. Any ideas? stackTrace = [ { className = sun.misc.Unsafe; fileName = Unsafe.java; lineNumber = -2; methodName = park; nativeMethod = true; }, { className = java.util.concurrent.locks.LockSupport; fileName = LockSupport.java; lineNumber = 158; methodName = park; nativeMethod = false; }, { className = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject; fileName = AbstractQueuedSynchronizer.java; lineNumber = 1987; methodName = await; nativeMethod = false; }, { className = java.util.concurrent.LinkedBlockingQueue; fileName = LinkedBlockingQueue.java; lineNumber = 399; methodName = take; nativeMethod = false; }, { className = org.apache.cassandra.net.OutboundTcpConnection; fileName = OutboundTcpConnection.java; lineNumber = 104; methodName = run; nativeMethod = false; } ]; -- On Mon, Apr 29, 2013 at 4:31 PM, aaron morton aa...@thelastpickle.com wrote: I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. That does not sound too good. My first guess would be lots of client connections. What client are you using, does it do connection pooling ? See the comments in cassandra.yaml around rpc_server_type, the default uses sync uses one thread per connection, you may be better with HSHA. But if your app is leaking connection you should probably deal with that first. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 3:07 AM, William Oberman ober...@civicscience.com wrote: Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runtime. I've checked and my system settings seem acceptable: memlock=unlimited nofiles=10 nproc=122944