Re: What is a node's "counter ID?"

2017-10-20 Thread Blake Eggleston
I believe that’s just referencing a counter implementation detail. If I 
remember correctly, there was a fairly large improvement of the implementation 
of counters in 2.1, and the assignment of the id would basically be a format 
migration.

> On Oct 20, 2017, at 9:57 AM, Paul Pollack  wrote:
> 
> Hi,
> 
> I was reading the doc page for nodetool cleanup 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html 
> because I was planning to run it after replacing a node in my counter cluster 
> and the sentence "Cassandra assigns a new counter ID to the node" gave me 
> pause. I can't find any other reference to a node's counter ID in the docs 
> and was wondering if anyone here could shed light on what this means, and how 
> it would affect the data being stored on a node that had its counter ID 
> changed?
> 
> Thanks,
> Paul


What is a node's "counter ID?"

2017-10-20 Thread Paul Pollack
Hi,

I was reading the doc page for nodetool cleanup
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html
because I was planning to run it after replacing a node in my counter
cluster and the sentence "Cassandra assigns a new counter ID to the node"
gave me pause. I can't find any other reference to a node's counter ID in
the docs and was wondering if anyone here could shed light on what this
means, and how it would affect the data being stored on a node that had its
counter ID changed?

Thanks,
Paul


Lot of hints piling up

2017-10-20 Thread Jai Bheemsen Rao Dhanwada
Hello,

We have cassandra cluster in 3 regions with version 2.1.13, and all of a
sudden we started seeing lot of hints accumulating on the nodes. We are
pretty sure there is no issue with the network between the regions and all
the nodes are up and running all the time.

Is there any other reason for the hints accumulation other than the n/w?
eg: wide rows or bigger objects?

Any pointers here could be very helpful.

b/w the hints gets processed after some point of time.


Re: Best approach to prepare to shutdown a cassandra node

2017-10-20 Thread Lutaya Shafiq Holmes
Looking at the code in trunk, the stopdemon command invokes the
CassandraDaemon.stop() function which does a graceful shutdown by
stopping jmxServer and drains the node by the shutdown hook.


On 10/20/17, Simon Fontana Oscarsson
 wrote:
> Yes, drain will always be run when Cassandra exits normally.
>
> On 2017-10-20 00:57, Varun Gupta wrote:
>> Does, nodetool stopdaemon, implicitly drain too? or we should invoke
>> drain and then stopdaemon?
>>
>> On Mon, Oct 16, 2017 at 4:54 AM, Simon Fontana Oscarsson
>> > > wrote:
>>
>> Looking at the code in trunk, the stopdemon command invokes the
>> CassandraDaemon.stop() function which does a graceful shutdown by
>> stopping jmxServer and drains the node by the shutdown hook.
>>
>> /Simon
>>
>>
>> On 2017-10-13 20:42, Javier Canillas wrote:
>>> As far as I know, the nodetool stopdaemon is doing a "kill -9".
>>>
>>> Or did it change?
>>>
>>> 2017-10-12 23:49 GMT-03:00 Anshu Vajpayee
>>> >:
>>>
>>> Why are you killing when we have nodetool stopdaemon ?
>>>
>>> On Fri, Oct 13, 2017 at 1:49 AM, Javier Canillas
>>> >> > wrote:
>>>
>>> That's what I thought.
>>>
>>> Thanks!
>>>
>>> 2017-10-12 14:26 GMT-03:00 Hannu Kröger
>>> >:
>>>
>>> Hi,
>>>
>>> Drain should be enough.  It stops accepting writes
>>> and after that cassandra can be safely shut down.
>>>
>>> Hannu
>>>
>>> On 12 October 2017 at 20:24:41, Javier Canillas
>>> (javier.canil...@gmail.com
>>> ) wrote:
>>>
 Hello everyone,

 I have some time working with Cassandra, but every
 time I need to shutdown a node (for any reason like
 upgrading version or moving instance to another
 host) I see several errors on the client
 applications (yes, I'm using the official java driver).

 By the way, I'm starting C* as a stand-alone process

 ,
 and C* version is 3.11.0.

 The way I have implemented the shutdown process is
 something like the following:

 /# Drain all information from commitlog into sstables/
 /bin/nodetool drain
 /
 /
 /
 /cassandra_pid=`ps -ef|grep
 "java.*apache-cassandra"|grep -v "grep"|awk '{print
 $2}'`
 /
 /if [ ! -z "$cassandra_pid" ] && [ "$cassandra_pid"
 -ne "1" ]; then/
 /  echo "Asking Cassandra to shutdown (nodetool
 drain doesn't stop cassandra)"/
 /  kill $cassandra_pid/
 /
 /
 /  echo -n "+ Checking it is down. "/
 /  counter=10/
 /  while [ "$counter" -ne 0 -a ! kill -0
 $cassandra_pid > /dev/null 2>&1 ]/
 /  do/
 /          echo -n ". "/
 /((counter--))/
 /sleep 1s/
 /  done/
 /  echo ""/
 /  if ! kill -0 $cassandra_pid > /dev/null 2>&1; then/
 /          echo "+ Its down."/
 /  else/
 /          echo "- Killing Cassandra."/
 /          kill -9 $cassandra_pid/
 /  fi/
 /else/
 /  echo "Care there was a problem finding Cassandra
 PID"/
 /fi/
 /
 /
 Should I add at the beginning the following lines?

 echo "shutdowing cassandra gracefully with: nodetool
 disable gossip"
 $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool
 disablegossip
 echo "shutdowing cassandra gracefully with: nodetool
 disable binary protocol"
 $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool
 disablebinary
 echo "shutdowing cassandra gracefully with: nodetool
 thrift"
 

Re: Best approach to prepare to shutdown a cassandra node

2017-10-20 Thread Simon Fontana Oscarsson

Yes, drain will always be run when Cassandra exits normally.

On 2017-10-20 00:57, Varun Gupta wrote:
Does, nodetool stopdaemon, implicitly drain too? or we should invoke 
drain and then stopdaemon?


On Mon, Oct 16, 2017 at 4:54 AM, Simon Fontana Oscarsson 
> wrote:


Looking at the code in trunk, the stopdemon command invokes the
CassandraDaemon.stop() function which does a graceful shutdown by
stopping jmxServer and drains the node by the shutdown hook.

/Simon


On 2017-10-13 20:42, Javier Canillas wrote:

As far as I know, the nodetool stopdaemon is doing a "kill -9".

Or did it change?

2017-10-12 23:49 GMT-03:00 Anshu Vajpayee
>:

Why are you killing when we have nodetool stopdaemon ?

On Fri, Oct 13, 2017 at 1:49 AM, Javier Canillas
> wrote:

That's what I thought.

Thanks!

2017-10-12 14:26 GMT-03:00 Hannu Kröger
>:

Hi,

Drain should be enough.  It stops accepting writes
and after that cassandra can be safely shut down.

Hannu

On 12 October 2017 at 20:24:41, Javier Canillas
(javier.canil...@gmail.com
) wrote:


Hello everyone,

I have some time working with Cassandra, but every
time I need to shutdown a node (for any reason like
upgrading version or moving instance to another
host) I see several errors on the client
applications (yes, I'm using the official java driver).

By the way, I'm starting C* as a stand-alone process

,
and C* version is 3.11.0.

The way I have implemented the shutdown process is
something like the following:

/# Drain all information from commitlog into sstables/
/bin/nodetool drain
/
/
/
/cassandra_pid=`ps -ef|grep
"java.*apache-cassandra"|grep -v "grep"|awk '{print
$2}'`
/
/if [ ! -z "$cassandra_pid" ] && [ "$cassandra_pid"
-ne "1" ]; then/
/  echo "Asking Cassandra to shutdown (nodetool
drain doesn't stop cassandra)"/
/  kill $cassandra_pid/
/
/
/  echo -n "+ Checking it is down. "/
/  counter=10/
/  while [ "$counter" -ne 0 -a ! kill -0
$cassandra_pid > /dev/null 2>&1 ]/
/  do/
/          echo -n ". "/
/((counter--))/
/sleep 1s/
/  done/
/  echo ""/
/  if ! kill -0 $cassandra_pid > /dev/null 2>&1; then/
/          echo "+ Its down."/
/  else/
/          echo "- Killing Cassandra."/
/          kill -9 $cassandra_pid/
/  fi/
/else/
/  echo "Care there was a problem finding Cassandra
PID"/
/fi/
/
/
Should I add at the beginning the following lines?

echo "shutdowing cassandra gracefully with: nodetool
disable gossip"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool
disablegossip
echo "shutdowing cassandra gracefully with: nodetool
disable binary protocol"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool
disablebinary
echo "shutdowing cassandra gracefully with: nodetool
thrift"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool
disablethrift

The shutdown log is the following:

/WARN [RMI TCP Connection(10)-127.0.0.1] 2017-10-12
14:20:52,343 StorageService.java:321 - Stopping
gossip by operator request/
/INFO [RMI TCP Connection(10)-127.0.0.1] 2017-10-12
14:20:52,344 Gossiper.java:1532 - Announcing shutdown/
/INFO [RMI TCP Connection(10)-127.0.0.1] 2017-10-12
14:20:52,355 StorageService.java:2268 - Node
/10.254.169.36  state jump to
shutdown/
/INFO [RMI TCP Connection(12)-127.0.0.1] 2017-10-12

回复: split one DC from a cluster

2017-10-20 Thread Peng Xiao
Thanks Kurt,we may will still use snapshot and sstableloader to split this 
schema to another cluster.




-- 原始邮件 --
发件人: "kurt";;
发送时间: 2017年10月19日(星期四) 晚上6:11
收件人: "User";

主题: Re: split one DC from a cluster



Easiest way is to separate them via firewall/network partition so the DC's 
can't talk to each other, ensure each DC sees the other DC as DOWN, then remove 
the other DC from replication, then remove all the nodes in the opposite DC 
using removenode.​

Re: Not marking node down due to local pause

2017-10-20 Thread Alexander Dejanovski
Hi John,

the other main source of STW pause in the JVM is the safepoint mechanism :
http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html

If you turn on full GC logging in your cassandra-env.sh file, you will find
lines like this :

2017-10-09T20:13:42.462+: 4.890: Total time for which application
threads were stopped: 0.0003137 seconds, Stopping threads took: 0.0001163
seconds
2017-10-09T20:13:42.472+: 4.899: Total time for which application
threads were stopped: 0.0001622 seconds, Stopping threads took: 0.361
seconds
2017-10-09T20:13:46.162+: 8.590: Total time for which application
threads were stopped: 2.6899536 seconds, Stopping threads took: 2.6899004
seconds
2017-10-09T20:13:46.162+: 8.590: Total time for which application
threads were stopped: 0.0002418 seconds, Stopping threads took: 0.456
seconds
2017-10-09T20:13:46.461+: 8.889: Total time for which application
threads were stopped: 0.0002654 seconds, Stopping threads took: 0.397
seconds
2017-10-09T20:13:46.478+: 8.906: Total time for which application
threads were stopped: 0.0001646 seconds, Stopping threads took: 0.791
seconds

These aren't GCs but still you can see that we have a 2.6s pause, with most
of the time spent waiting for threads to reach the safepoint.
When we saw this in the past, it was due to faulty disks that were
preventing the read threads from reaching the safepoint.

If you want to specifically identify the threads that were stuck, you can
set a timeout on the safepoints :

# GC logging options
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput
-XX:LogFile=/var/log/cassandra/vm.log"
JVM_OPTS="$JVM_OPTS -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=5000"



Check the duration of the pauses you're seeing on your nodes and set a
shorter timeout (it should be fairly fast to reach safepoint). Above it is
set at 5s.
Restart your Cassandra process with the above settings, and wait for one
pause to happen. Then stop Cassandra and it will output informations in
the /var/log/cassandra/vm.log file (that only happens when the process
stops, nothing gets written there before that).

If indeed some threads were preventing the safepoint, they'll get listed
there.

Let us know how it goes.

Cheers,


On Fri, Oct 20, 2017 at 5:11 AM John Sanda  wrote:

> I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a
> lot of these messages in both logs:
>
> WARN  07:23:16 Not marking nodes down due to local pause of 7219277694 >
> 50
>
> I am fairly certain that they are not due to GC. I am not seeing a whole
> of GC being logged and nothing over 500 ms. I do think it is I/O related.
>
> I am seeing lots of read timeouts for queries to a table that has a large
> growing number of SSTables. At last count there are over 1800 SSTables on
> one node. The count is lower on the other node, and I suspect that this is
> due to data distribution. Slowly but surely the number of SSTables keeps
> going up, and not surprisingly nodetool tablehistograms reports high
> latencies. The table is using STCS.
>
> I am seeing some but not a whole lot of dropped mutations. nodetool
> tpstats looks ok.
>
> The growing number of SSTables really makes me think this is an I/O issue.
> Casssandra is running in a kubernetes cluster using a SAN which is another
> reason I suspect I/O.
>
> What are some things I can look at/test to determine what is causing all
> of local pauses?
>
>
> - John
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com