Re: Ring show high load average when restarting a node.

2016-12-06 Thread Colin Kuo
Hi,

The feature of speculative execution in Cassandra 2.0 helps in this case.
You can get further explanation on below link.
http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2

Thanks!


On Tue, Dec 6, 2016 at 10:13 AM, Sungju Hong  wrote:

> Hello,
>
> when I restart a node, other(most) nodes show high load average and block
> queries for one or two minutes.
> why other nodes are affected ?
>
> - I have a cluster of 70 nodes.
> - Cassandra version 1.2.3
> - RF: 3
> - disabled hinted handoff
>
> I will appreciate any advice.
>
> Thanks.
> Regards.
>
>
>


Re: sstables keep growing on cassandra 2.1

2014-11-19 Thread Colin Kuo
Hi,

Can you please firstly check the nodetool compactionstats during repair?
I'm afraid that minor compaction may be blocked by whatever tasks that
causes the number of SStable keep growing.

On Sat, Nov 15, 2014 at 7:47 AM, James Derieg james.der...@uplynk.com
wrote:

 Hi everyone,
 I'm hoping someone can help me with a weird issue on Cassandra 2.1.
 The sstables on my cluster keep growing to a huge number when I run a
 nodetool repair.  On the attached graph, I ran a manual 'nodetool compact'
 on each node in the cluster, which brought them back down to a low number
 of sstables.  Then I immediately ran a nodetool repair, and the sstables
 jumped back up.  Has anyone seen this behavior?  Is this expected? I have
 some 2.0 clusters in the same environment, and they don't do this.
 Thanks in advance for your help.
 ᐧ



Re: decommissioning a cassandra node

2014-10-27 Thread Colin Kuo
Hi Tim,

The node with IP 94 is leaving. Maybe something wrong happens during
streaming data. You could use nodetool netstats on both nodes to monitor
if there is any streaming connection stuck.

Indeed, you could force remove the leaving node by shutting down it
directly. Then, perform nodetool removenode to remove dead node. But you
should understand you're taking the risk to lose data if your RF in cluster
is lower than 3 and data have not been fully synced. Therefore, remember to
sync data using repair before you're going to remove/decommission the node
in cluster.

Thanks!

On Mon, Oct 27, 2014 at 9:55 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Also, is there any document that explains what all the nodetool
 abbreviations (UN, UL) stand for?
 -- The documentation is in the command output itself
 Datacenter: datacenter1
 ===

 *Status=Up/Down*
 *|/ State=Normal/Leaving/Joining/Moving*--  Address Load
 Tokens  OwnsHost ID   Rack
 UN  162.243.86.41   1.08 MB1   0.1%
  e945f3b5-2e3e-4a20-b1bd-e30c474a7634  rack1
 UL  162.243.109.94  1.28 MB256 99.9%
 fd2f76ae-8dcf-4e93-a37f-bf1e9088696e  rack1
 U = Up, D = Down
 N = Normal, L = Leaving, J = Joining and M = Moving


 Ok, got it, thanks!

 Can someone suggest a good way to fix a node that is in an UL state?

 Thanks
 Tim

 On Mon, Oct 27, 2014 at 9:46 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Also, is there any document that explains what all the nodetool
 abbreviations (UN, UL) stand for?

 -- The documentation is in the command output itself

 Datacenter: datacenter1
 ===
 *Status=Up/Down*
 *|/ State=Normal/Leaving/Joining/Moving*
 --  Address Load   Tokens  OwnsHost ID
 Rack
 UN  162.243.86.41   1.08 MB1   0.1%
  e945f3b5-2e3e-4a20-b1bd-e30c474a7634  rack1
 UL  162.243.109.94  1.28 MB256 99.9%
 fd2f76ae-8dcf-4e93-a37f-bf1e9088696e  rack1

 U = Up, D = Down
 N = Normal, L = Leaving, J = Joining and M = Moving

 On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote:

 As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is
 causing the problem


 OK, that's an interesting observation.How do you fix a node that is an
 UL state? What causes this?

 Also, is there any document that explains what all the nodetool
 abbreviations (UN, UL) stand for?

 On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com
 wrote:

 As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is
 causing the problem.


   On Sunday, October 26, 2014 11:57 PM, Tim Dunphy 
 bluethu...@gmail.com wrote:


 Hey all,

  I'm trying to decommission a node.

  First I'm getting a status:

 [root@beta-new:/usr/local] #nodetool status
 Note: Ownership information does not include topology; for complete
 information, specify a keyspace
 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  OwnsHost ID
   Rack
 UN  162.243.86.41   1.08 MB1   0.1%
  e945f3b5-2e3e-4a20-b1bd-e30c474a7634  rack1
 UL  162.243.109.94  1.28 MB256 99.9%
 fd2f76ae-8dcf-4e93-a37f-bf1e9088696e  rack1


 But when I try to decommission the node I get this message:

 [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission
 nodetool: Failed to connect to '162.243.86.41:7199' -
 NoSuchObjectException: 'no such object in table'.

 Yet I can telnet to that host on that port just fine:

 [root@beta-new:/usr/local] #telnet 162.243.86.41 7199
 Trying 162.243.86.41...
 Connected to 162.243.86.41.
 Escape character is '^]'.


 And I have verified that cassandra is running and accessible via cqlsh
 on the other machine.

 What could be going wrong?

 Thanks
 Tim


 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B






 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B





 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Re: unable to load data using sstableloader

2014-07-28 Thread Colin Kuo
Have you created the schema for these data files? I meant the schema should
be created before you load these data file to C*.

Here is the article for introduction of sstableloader that you could refer.
http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsBulkloader_t.html



On Mon, Jul 28, 2014 at 7:28 PM, Akshay Ballarpure 
akshay.ballarp...@tcs.com wrote:


 Hello,
 I am unable to load sstable into cassandra using sstable loader, please
 suggest. Thanks.

 [root@CSL-simulation conf]# pwd
 /root/Akshay/Cassandra/apache-cassandra-2.0.8/conf
 [root@CSL-simulation conf]# ls -ltr keyspace/col/
 total 32
 -rw-r--r-- 1 root root   16 Jul 28 16:55 Test-Data-jb-1-Filter.db
 -rw-r--r-- 1 root root  300 Jul 28 16:55 Test-Data-jb-1-Index.db
 -rw-r--r-- 1 root root 3470 Jul 28 16:55 Test-Data-jb-1-Data.db
 -rw-r--r-- 1 root root8 Jul 28 16:55 Test-Data-jb-1-CRC.db
 -rw-r--r-- 1 root root   64 Jul 28 16:55 Test-Data-jb-1-Digest.sha1
 -rw-r--r-- 1 root root 4392 Jul 28 16:55 Test-Data-jb-1-Statistics.db
 -rw-r--r-- 1 root root   79 Jul 28 16:55 Test-Data-jb-1-TOC.txt


 [root@CSL-simulation conf]# ../bin/sstableloader -d localhost
 /root/Akshay/Cassandra/apache-cassandra-2.0.8/conf/keyspace/col/ --debug
 Could not retrieve endpoint ranges:
 InvalidRequestException(why:No such keyspace: keyspace)
 java.lang.RuntimeException: Could not retrieve endpoint ranges:
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:259)
 at
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:85)
 Caused by: InvalidRequestException(why:No such keyspace: keyspace)
 at
 org.apache.cassandra.thrift.Cassandra$describe_ring_result$describe_ring_resultStandardScheme.read(Cassandra.java:34055)
 at
 org.apache.cassandra.thrift.Cassandra$describe_ring_result$describe_ring_resultStandardScheme.read(Cassandra.java:34022)
 at
 org.apache.cassandra.thrift.Cassandra$describe_ring_result.read(Cassandra.java:33964)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1251)
 at
 org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1238)
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:235)
 ... 2 more


 Thanks  Regards
 Akshay Ghanshyam Ballarpure
 Tata Consultancy Services
 Cell:- 9985084075
 Mailto: akshay.ballarp...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty.IT Services
Business Solutions
Consulting
 

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you




Re: Advice on how to handle corruption in system/hints

2014-06-09 Thread Colin Kuo
Hi Francois,

We're facing the same issue like yours. The approach we did is to

1. scrub that corrupted data file
2. repair that column family

Immediately delete that corrupted files is not suggested if C* instance is
running.
This might be happening if bad disk or power outage.

Thanks,

Colin


http://about.me/ColinKuo
Colin Kuo
about.me/ColinKuo
[image: Colin Kuo on about.me]

http://about.me/ColinKuo


On Mon, Jun 9, 2014 at 6:11 AM, Francois Richard frich...@yahoo-inc.com
wrote:

  Hi everyone,

  We are running some Cassandra clusters (Usually a cluster of 5 nodes
 with replication factor of 3.)  And at least once per day we do see some
 corruption related to a specific sstable in system/hints. (We are using
 Cassandra version 1.2.16 on RHEL 6.5)

  Here is an example of such exception:

   ERROR [CompactionExecutor:1694] 2014-06-08 21:37:33,267
 CassandraDaemon.java (line 191) Exception in thread
 Thread[CompactionExecutor:1694,1,main]

 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 8224262783474088549 starting at 502360510
 would be larger than file /home/y/var/cassandra/data/syste

 m/hints/system-hints-ic-281-Data.db length 504590769

 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)

 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)

 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)

 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)

 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)

 at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)

 at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)

 at
 org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)

 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122)

 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96)

 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)

 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

 at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)

 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)

 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

 at
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)

 at
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)

 at
 org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)

 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

 at java.util.concurrent.FutureTask.run(FutureTask.java:262)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

 Caused by: java.io.IOException: dataSize of 8224262783474088549 starting
 at 502360510 would be larger than file
 /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length
 504590769

 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)

 ... 23 more

  INFO [HintedHandoff:35] 2014-06-08 21:37:33,267
 HintedHandOffManager.java (line 296) Started hinted handoff for host:
 502a48cd-171b-4e83-a9ad-67f32437353a with IP: /10.210.239.190

 ERROR [HintedHandoff:33] 2014-06-08 21:37:33,267 CassandraDaemon.java
 (line 191) Exception in thread Thread[HintedHandoff:33,1,main]

 java.lang.RuntimeException: java.util.concurrent.ExecutionException:
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 8224262783474088549 starting at 502360510
 would be larger than file
 /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length
 504590769

 at
 org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441)

 at
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)

 at
 org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)

 at
 org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java

Re: high pending compactions

2014-06-09 Thread Colin Kuo
As Jake suggested, you could firstly increase
compaction_throughput_mb_per_sec and concurrent_compactions to suitable
values if system resource is allowed. From my understanding, major
compaction will internally acquire lock before running compaction. In your
case, there might be a major compaction blocking the pending following
compaction tasks. You could check the result of nodetool compactionstats
and C* system log for double confirm.

If the running compaction is compacting wide row for a long time, you could
try to tune in_memory_compaction_limit_in_mb value.

Thanks,



On Sun, Jun 8, 2014 at 11:27 PM, S C as...@outlook.com wrote:

 I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending
 compaction count. pending tasks: 67 while active compaction tasks are
 not more than 5. I have a 24CPU machine. Shouldn't I be seeing more
 compactions? Is this a pattern of high writes and compactions backing up?
 How can I improve this? Here are my thoughts.


1. Increase memtable_total_space_in_mb
2. Increase compaction_throughput_mb_per_sec
3. Increase concurrent_compactions


 Sorry if this was discussed already. Any pointers is much appreciated.

 Thanks,
 Kumar



Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)

2014-06-09 Thread Colin Kuo
You can use nodetool repair instead. Repair is able to re-transmit the
data which belongs to new node.



On Tue, Jun 10, 2014 at 10:40 AM, Mike Heffner m...@librato.com wrote:

 Hi,

 During an attempt to bootstrap a new node into a 1.2.16 ring the new node
 saw one of the streaming nodes periodically disappear:

  INFO [GossipTasks:1] 2014-06-10 00:28:52,572 Gossiper.java (line 823)
 InetAddress /10.156.1.2 is now DOWN
 ERROR [GossipTasks:1] 2014-06-10 00:28:52,574 AbstractStreamSession.java
 (line 108) Stream failed because /10.156.1.2 died or was
 restarted/removed (streams may still be active in background, but further
 streams won't be started)
  WARN [GossipTasks:1] 2014-06-10 00:28:52,574 RangeStreamer.java (line
 246) Streaming from /10.156.1.2 failed
  INFO [HANDSHAKE-/10.156.1.2] 2014-06-10 00:28:57,922
 OutboundTcpConnection.java (line 418) Handshaking version with /10.156.1.2
  INFO [GossipStage:1] 2014-06-10 00:28:57,943 Gossiper.java (line 809)
 InetAddress /10.156.1.2 is now UP

 This brief interruption was enough to kill the streaming from node
 10.156.1.2. Node 10.156.1.2 saw a similar broken pipe exception from the
 bootstrapping node:

 ERROR [Streaming to /10.156.193.1.3] 2014-06-10 01:22:02,345
 CassandraDaemon.java (line 191) Exception in thread Thread[Streaming to /
 10.156.1.3:1,5,main]
  java.lang.RuntimeException: java.io.IOException: Broken pipe
 at com.google.common.base.Throwables.propagate(Throwables.java:160)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.io.IOException: Broken pipe
 at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
 at
 sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
 at
 org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
 at
 org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)


 During bootstrapping we notice a significant spike in CPU and latency
 across the board on the ring (CPU 50-85% and write latencies 60ms -
 250ms). It seems likely that this persistent high load led to the hiccup
 that caused the gossiper to see the streaming node as briefly down.

 What is the proper way to recover from this? The original estimate was
 almost 24 hours to stream all the data required to bootstrap this single
 node (streaming set to unlimited) and this occurred 6 hours into the
 bootstrap. With such high load from streaming it seems that simply
 restarting will inevitably hit this problem again.


 Cheers,

 Mike

 --

   Mike Heffner m...@librato.com
   Librato, Inc.




Starting up Cassandra occurred errors after upgrading Cassandra to 1.2.5 from 1.0.12

2013-05-29 Thread Colin Kuo
Hi All,

We followed the upgrade guide(
http://www.datastax.com/docs/1.2/install/upgrading) from Datastax web site
and upgraded Cassadra to 1.2.5, but it occurred errors in system.log when
starting up.

After digging into code level, it looks like Cassandra found the file
length of IndexSummary sstable is zero. Thus Cassandra threw
AssertionError. In fact, the file length of the IndexSummary is about 80
bytes, not zero. It's weird.

Also we observed that only happens on the IndexSummary file of secondary
index. The errors can be reproducible. Below are my upgrade steps.
1. Shutdown all of client applications.
2. Run nodetool drain before shutting down the existing Cassandra service.
3. Stop old Cassandra process, then start the new binary process using
migrated cassandra.yaml.
4. Run nodetool upgradesstables -a in order to upgrade all of sstable
files become new format.
5. Restart Cassandra process and monitor the logs file for any issues.
At step 5, we found the error messages as below.

Any ideas?

Thank you!
Colin

===
 INFO [SSTableBatchOpen:2] 2013-05-29 04:38:40,085 SSTableReader.java (line
169) Opening
/var/lib/cassandra/data/ks/user/ks-user.ks_user_personalID-ic-61 (58 bytes)
ERROR [SSTableBatchOpen:1] 2013-05-29 04:38:40,085 CassandraDaemon.java
(line 175) Exception in thread Thread[SSTableBatchOpen:1,5,main]
java.lang.AssertionError
at
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:401)
at
org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.deserialize(IndexSummary.java:124)
 at
org.apache.cassandra.io.sstable.SSTableReader.loadSummary(SSTableReader.java:426)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:360)
 at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:201)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:154)
 at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
ERROR [SSTableBatchOpen:2] 2013-05-29 04:38:40,085 CassandraDaemon.java
(line 175) Exception in thread Thread[SSTableBatchOpen:2,5,main]
java.lang.AssertionError
at
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:401)
at
org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.deserialize(IndexSummary.java:124)
 at
org.apache.cassandra.io.sstable.SSTableReader.loadSummary(SSTableReader.java:426)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:360)
 at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:201)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:154)
 at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
===