Re: Ring show high load average when restarting a node.
Hi, The feature of speculative execution in Cassandra 2.0 helps in this case. You can get further explanation on below link. http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2 Thanks! On Tue, Dec 6, 2016 at 10:13 AM, Sungju Hongwrote: > Hello, > > when I restart a node, other(most) nodes show high load average and block > queries for one or two minutes. > why other nodes are affected ? > > - I have a cluster of 70 nodes. > - Cassandra version 1.2.3 > - RF: 3 > - disabled hinted handoff > > I will appreciate any advice. > > Thanks. > Regards. > > >
Re: sstables keep growing on cassandra 2.1
Hi, Can you please firstly check the nodetool compactionstats during repair? I'm afraid that minor compaction may be blocked by whatever tasks that causes the number of SStable keep growing. On Sat, Nov 15, 2014 at 7:47 AM, James Derieg james.der...@uplynk.com wrote: Hi everyone, I'm hoping someone can help me with a weird issue on Cassandra 2.1. The sstables on my cluster keep growing to a huge number when I run a nodetool repair. On the attached graph, I ran a manual 'nodetool compact' on each node in the cluster, which brought them back down to a low number of sstables. Then I immediately ran a nodetool repair, and the sstables jumped back up. Has anyone seen this behavior? Is this expected? I have some 2.0 clusters in the same environment, and they don't do this. Thanks in advance for your help. ᐧ
Re: decommissioning a cassandra node
Hi Tim, The node with IP 94 is leaving. Maybe something wrong happens during streaming data. You could use nodetool netstats on both nodes to monitor if there is any streaming connection stuck. Indeed, you could force remove the leaving node by shutting down it directly. Then, perform nodetool removenode to remove dead node. But you should understand you're taking the risk to lose data if your RF in cluster is lower than 3 and data have not been fully synced. Therefore, remember to sync data using repair before you're going to remove/decommission the node in cluster. Thanks! On Mon, Oct 27, 2014 at 9:55 PM, Tim Dunphy bluethu...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving*-- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving Ok, got it, thanks! Can someone suggest a good way to fix a node that is in an UL state? Thanks Tim On Mon, Oct 27, 2014 at 9:46 AM, DuyHai Doan doanduy...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving* -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: unable to load data using sstableloader
Have you created the schema for these data files? I meant the schema should be created before you load these data file to C*. Here is the article for introduction of sstableloader that you could refer. http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsBulkloader_t.html On Mon, Jul 28, 2014 at 7:28 PM, Akshay Ballarpure akshay.ballarp...@tcs.com wrote: Hello, I am unable to load sstable into cassandra using sstable loader, please suggest. Thanks. [root@CSL-simulation conf]# pwd /root/Akshay/Cassandra/apache-cassandra-2.0.8/conf [root@CSL-simulation conf]# ls -ltr keyspace/col/ total 32 -rw-r--r-- 1 root root 16 Jul 28 16:55 Test-Data-jb-1-Filter.db -rw-r--r-- 1 root root 300 Jul 28 16:55 Test-Data-jb-1-Index.db -rw-r--r-- 1 root root 3470 Jul 28 16:55 Test-Data-jb-1-Data.db -rw-r--r-- 1 root root8 Jul 28 16:55 Test-Data-jb-1-CRC.db -rw-r--r-- 1 root root 64 Jul 28 16:55 Test-Data-jb-1-Digest.sha1 -rw-r--r-- 1 root root 4392 Jul 28 16:55 Test-Data-jb-1-Statistics.db -rw-r--r-- 1 root root 79 Jul 28 16:55 Test-Data-jb-1-TOC.txt [root@CSL-simulation conf]# ../bin/sstableloader -d localhost /root/Akshay/Cassandra/apache-cassandra-2.0.8/conf/keyspace/col/ --debug Could not retrieve endpoint ranges: InvalidRequestException(why:No such keyspace: keyspace) java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:259) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:85) Caused by: InvalidRequestException(why:No such keyspace: keyspace) at org.apache.cassandra.thrift.Cassandra$describe_ring_result$describe_ring_resultStandardScheme.read(Cassandra.java:34055) at org.apache.cassandra.thrift.Cassandra$describe_ring_result$describe_ring_resultStandardScheme.read(Cassandra.java:34022) at org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:33964) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1251) at org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1238) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:235) ... 2 more Thanks Regards Akshay Ghanshyam Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com Experience certainty.IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: Advice on how to handle corruption in system/hints
Hi Francois, We're facing the same issue like yours. The approach we did is to 1. scrub that corrupted data file 2. repair that column family Immediately delete that corrupted files is not suggested if C* instance is running. This might be happening if bad disk or power outage. Thanks, Colin http://about.me/ColinKuo Colin Kuo about.me/ColinKuo [image: Colin Kuo on about.me] http://about.me/ColinKuo On Mon, Jun 9, 2014 at 6:11 AM, Francois Richard frich...@yahoo-inc.com wrote: Hi everyone, We are running some Cassandra clusters (Usually a cluster of 5 nodes with replication factor of 3.) And at least once per day we do see some corruption related to a specific sstable in system/hints. (We are using Cassandra version 1.2.16 on RHEL 6.5) Here is an example of such exception: ERROR [CompactionExecutor:1694] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:1694,1,main] org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/syste m/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123) ... 23 more INFO [HintedHandoff:35] 2014-06-08 21:37:33,267 HintedHandOffManager.java (line 296) Started hinted handoff for host: 502a48cd-171b-4e83-a9ad-67f32437353a with IP: /10.210.239.190 ERROR [HintedHandoff:33] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[HintedHandoff:33,1,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
Re: high pending compactions
As Jake suggested, you could firstly increase compaction_throughput_mb_per_sec and concurrent_compactions to suitable values if system resource is allowed. From my understanding, major compaction will internally acquire lock before running compaction. In your case, there might be a major compaction blocking the pending following compaction tasks. You could check the result of nodetool compactionstats and C* system log for double confirm. If the running compaction is compacting wide row for a long time, you could try to tune in_memory_compaction_limit_in_mb value. Thanks, On Sun, Jun 8, 2014 at 11:27 PM, S C as...@outlook.com wrote: I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending compaction count. pending tasks: 67 while active compaction tasks are not more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is this a pattern of high writes and compactions backing up? How can I improve this? Here are my thoughts. 1. Increase memtable_total_space_in_mb 2. Increase compaction_throughput_mb_per_sec 3. Increase concurrent_compactions Sorry if this was discussed already. Any pointers is much appreciated. Thanks, Kumar
Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)
You can use nodetool repair instead. Repair is able to re-transmit the data which belongs to new node. On Tue, Jun 10, 2014 at 10:40 AM, Mike Heffner m...@librato.com wrote: Hi, During an attempt to bootstrap a new node into a 1.2.16 ring the new node saw one of the streaming nodes periodically disappear: INFO [GossipTasks:1] 2014-06-10 00:28:52,572 Gossiper.java (line 823) InetAddress /10.156.1.2 is now DOWN ERROR [GossipTasks:1] 2014-06-10 00:28:52,574 AbstractStreamSession.java (line 108) Stream failed because /10.156.1.2 died or was restarted/removed (streams may still be active in background, but further streams won't be started) WARN [GossipTasks:1] 2014-06-10 00:28:52,574 RangeStreamer.java (line 246) Streaming from /10.156.1.2 failed INFO [HANDSHAKE-/10.156.1.2] 2014-06-10 00:28:57,922 OutboundTcpConnection.java (line 418) Handshaking version with /10.156.1.2 INFO [GossipStage:1] 2014-06-10 00:28:57,943 Gossiper.java (line 809) InetAddress /10.156.1.2 is now UP This brief interruption was enough to kill the streaming from node 10.156.1.2. Node 10.156.1.2 saw a similar broken pipe exception from the bootstrapping node: ERROR [Streaming to /10.156.193.1.3] 2014-06-10 01:22:02,345 CassandraDaemon.java (line 191) Exception in thread Thread[Streaming to / 10.156.1.3:1,5,main] java.lang.RuntimeException: java.io.IOException: Broken pipe at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) During bootstrapping we notice a significant spike in CPU and latency across the board on the ring (CPU 50-85% and write latencies 60ms - 250ms). It seems likely that this persistent high load led to the hiccup that caused the gossiper to see the streaming node as briefly down. What is the proper way to recover from this? The original estimate was almost 24 hours to stream all the data required to bootstrap this single node (streaming set to unlimited) and this occurred 6 hours into the bootstrap. With such high load from streaming it seems that simply restarting will inevitably hit this problem again. Cheers, Mike -- Mike Heffner m...@librato.com Librato, Inc.
Starting up Cassandra occurred errors after upgrading Cassandra to 1.2.5 from 1.0.12
Hi All, We followed the upgrade guide( http://www.datastax.com/docs/1.2/install/upgrading) from Datastax web site and upgraded Cassadra to 1.2.5, but it occurred errors in system.log when starting up. After digging into code level, it looks like Cassandra found the file length of IndexSummary sstable is zero. Thus Cassandra threw AssertionError. In fact, the file length of the IndexSummary is about 80 bytes, not zero. It's weird. Also we observed that only happens on the IndexSummary file of secondary index. The errors can be reproducible. Below are my upgrade steps. 1. Shutdown all of client applications. 2. Run nodetool drain before shutting down the existing Cassandra service. 3. Stop old Cassandra process, then start the new binary process using migrated cassandra.yaml. 4. Run nodetool upgradesstables -a in order to upgrade all of sstable files become new format. 5. Restart Cassandra process and monitor the logs file for any issues. At step 5, we found the error messages as below. Any ideas? Thank you! Colin === INFO [SSTableBatchOpen:2] 2013-05-29 04:38:40,085 SSTableReader.java (line 169) Opening /var/lib/cassandra/data/ks/user/ks-user.ks_user_personalID-ic-61 (58 bytes) ERROR [SSTableBatchOpen:1] 2013-05-29 04:38:40,085 CassandraDaemon.java (line 175) Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.AssertionError at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:401) at org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.deserialize(IndexSummary.java:124) at org.apache.cassandra.io.sstable.SSTableReader.loadSummary(SSTableReader.java:426) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:360) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:201) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:154) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR [SSTableBatchOpen:2] 2013-05-29 04:38:40,085 CassandraDaemon.java (line 175) Exception in thread Thread[SSTableBatchOpen:2,5,main] java.lang.AssertionError at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:401) at org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.deserialize(IndexSummary.java:124) at org.apache.cassandra.io.sstable.SSTableReader.loadSummary(SSTableReader.java:426) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:360) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:201) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:154) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ===