Re: handling down node cassandra 2.0.15

2015-11-30 Thread Robert Coli
On Wed, Nov 18, 2015 at 6:16 AM, Anuj Wadehra 
wrote:

> Suppose, gc grace seconds=10days, max hinted handoff period=3 hrs, 3 nodes
> are there A,B & C,RF =3 and my client is reading at CL ONE. C remains down
> for 5 hours and misses many updates including those which happened after
> max hinted handoff period of 3 hrs. Now I bring back node C with
> auto_bootstrap false and run repair. If client queries at CL ONE and
> fetches a row which got updated after max hinted handoff period, there is a
> very high possibility of client returning stale data  from node C . But as
> soon as node C has joined the ring, it will start participating in WRITEs.
>
> But if we follow the procedure you suggested, node C will come back, run
> repair but wont participate in reads till we join it to the cluster. During
> repair, if client queries at CL ONE and fetches a row which got updated
> after max hinted handoff period expired and was missed by node C, it will
> still get latest data from A and B. So, the integrity of data is not lost
> similar to the case when we auto_bootsrap with true. Additionally we save
> the unique data of node C. While repair is going on, node C will get all
> the Writes.
>

Yes, during this time, C is getting "extra" writes as it is repairing
itself vis a vis A and B, but it is not serving reads.

=Rob


Re: handling down node cassandra 2.0.15

2015-11-18 Thread Anuj Wadehra
Robert,


This is how I interpret the implications of the 3 steps you suggested.Please 
confirm my interpretation as this is really important.


Suppose, gc grace seconds=10days, max hinted handoff period=3 hrs, 3 nodes are 
there A,B & C,RF =3 and my client is reading at CL ONE. C remains down for 5 
hours and misses many updates including those which happened after max hinted 
handoff period of 3 hrs. Now I bring back node C with auto_bootstrap false and 
run repair. If client queries at CL ONE and fetches a row which got updated 
after max hinted handoff period, there is a very high possibility of client 
returning stale data  from node C . But as soon as node C has joined the ring, 
it will start participating in WRITEs.


But if we follow the procedure you suggested, node C will come back, run repair 
but wont participate in reads till we join it to the cluster. During repair, if 
client queries at CL ONE and fetches a row which got updated after max hinted 
handoff period expired and was missed by node C, it will still get latest data 
from A and B. So, the integrity of data is not lost similar to the case when we 
auto_bootsrap with true. Additionally we save the unique data of node C. While 
repair is going on, node C will get all the Writes. 


@Anishek

Hinted handoffs are not related to gc grace seconds.


Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Anishek Agarwal" 
Date:Wed, 18 Nov, 2015 at 1:49 pm
Subject:Re: handling down node cassandra 2.0.15

@Rob interesting something i will try next time, for step 3 you mentioned -- I 
just remove the -Dcassandra.join_ring=false option and restart the cassandra 
service? 


@Anuj, gc_grace_seconds dictates how long hinted handoff are stored right. 
These might be good where we explicitly delete values from the table. we just 
have ttl and DTCS should delete data older than 1 month. In this case do i need 
to wipe the node and then start copy of key space again ? or can i run a repair 
once it joins the right with auto_bootstrap=false.


 


On Wed, Nov 18, 2015 at 1:20 AM, Robert Coli  wrote:

On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra  wrote:

Only if gc_grace_seconds havent passed since the failure. If your machine is 
down for more than gc_grace_seconds you need to delete the data directory and 
go with auto bootstrap = true .


Since CASSANDRA-6961 you can :


1) bring up the node with join_ring=false

2) repair it

3) join it to the cluster


https://issues.apache.org/jira/browse/CASSANDRA-6961


This prevents you from decreasing your unique replica count, which is usually a 
good thing!


=Rob




Re: handling down node cassandra 2.0.15

2015-11-18 Thread Anishek Agarwal
@Rob interesting something i will try next time, for step 3 you mentioned
-- I just remove the -Dcassandra.join_ring=false option and restart the
cassandra service?

@Anuj, gc_grace_seconds dictates how long hinted handoff are stored right.
These might be good where we explicitly delete values from the table. we
just have ttl and DTCS should delete data older than 1 month. In this case
do i need to wipe the node and then start copy of key space again ? or can
i run a repair once it joins the right with auto_bootstrap=false.



On Wed, Nov 18, 2015 at 1:20 AM, Robert Coli  wrote:

> On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra 
> wrote:
>
>> Only if gc_grace_seconds havent passed since the failure. If your machine
>> is down for more than gc_grace_seconds you need to delete the data
>> directory and go with auto bootstrap = true .
>>
>
> Since CASSANDRA-6961 you can :
>
> 1) bring up the node with join_ring=false
> 2) repair it
> 3) join it to the cluster
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> This prevents you from decreasing your unique replica count, which is
> usually a good thing!
>
> =Rob
>


Re: handling down node cassandra 2.0.15

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra 
wrote:

> Only if gc_grace_seconds havent passed since the failure. If your machine
> is down for more than gc_grace_seconds you need to delete the data
> directory and go with auto bootstrap = true .
>

Since CASSANDRA-6961 you can :

1) bring up the node with join_ring=false
2) repair it
3) join it to the cluster

https://issues.apache.org/jira/browse/CASSANDRA-6961

This prevents you from decreasing your unique replica count, which is
usually a good thing!

=Rob


Re: handling down node cassandra 2.0.15

2015-11-17 Thread Anuj Wadehra
Only if gc_grace_seconds havent passed since the failure. If your machine is 
down for more than gc_grace_seconds you need to delete the data directory and 
go with auto bootstrap = true .


Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Anishek Agarwal" 
Date:Tue, 17 Nov, 2015 at 10:52 am
Subject:Re: handling down node cassandra 2.0.15

hey Anuj,


Ok I will try that next time, so you are saying since i am replacing the 
machine in place(trying to get the same machine back in cluster) which already 
has some data, I dont clean the commitlogs/data directories and set 
auto_bootstrap = false and then restart the node, followed by repair on this 
machine right ?


thanks

anishek


On Mon, Nov 16, 2015 at 11:40 PM, Anuj Wadehra  wrote:

Hi Abhishek,


In my opinion, you already have data and bootstrapping is not needed here. You 
can set auto_bootstrap to false in Cassandra.yaml and once the cassandra is 
rebooted, you should run repair to fix the inconsistent data.



Thanks

Anuj




On Monday, 16 November 2015 10:34 PM, Josh Smith  
wrote:



Sis you set the JVM_OPTS to replace address? That is usually the error I get 
when I forget to set the replace_address on Cassandra-env.

 

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node

 

 

From: Anishek Agarwal [mailto:anis...@gmail.com] 
Sent: Monday, November 16, 2015 9:25 AM
To: user@cassandra.apache.org
Subject: Re: handling down node cassandra 2.0.15

 

nope its not

 

On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi 
 wrote:

Is that a seed node?

 

On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:

Hello,

 

We are having a 3 node cluster and one of the node went down due to a hardware 
memory failure looks like. We followed the steps below after the node was down 
for more than the default value of max_hint_window_in_ms 

 

I tried to restart cassandra by following the steps @

 

http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
 
http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 

except the "clear data" part as it was not specified in second blog above.

 

i was trying to restart the same node that went down, however I did not get the 
messages in log files as stated in 2 against "StorageService"

 

instead it just tried to replay and then stopped with the error message as 
below:

 

ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584) Exception 
encountered during startup

java.lang.RuntimeException: Cannot replace address with a node that is already 
bootstrapped

 

Can someone please help me if there is something i am doing wrong here. 

 

Thanks for the help in advance. 

 

Regards,

Anishek 

 






Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
hey Anuj,

Ok I will try that next time, so you are saying since i am replacing the
machine in place(trying to get the same machine back in cluster) which
already has some data, I dont clean the commitlogs/data directories and set
auto_bootstrap = false and then restart the node, followed by repair on
this machine right ?

thanks
anishek

On Mon, Nov 16, 2015 at 11:40 PM, Anuj Wadehra 
wrote:

> Hi Abhishek,
>
> In my opinion, you already have data and bootstrapping is not needed here.
> You can set auto_bootstrap to false in Cassandra.yaml and once the
> cassandra is rebooted, you should run repair to fix the inconsistent data.
>
>
> Thanks
> Anuj
>
>
>
> On Monday, 16 November 2015 10:34 PM, Josh Smith <
> josh.sm...@careerbuilder.com> wrote:
>
>
> Sis you set the JVM_OPTS to replace address? That is usually the error I
> get when I forget to set the replace_address on Cassandra-env.
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
>
>
> *From:* Anishek Agarwal [mailto:anis...@gmail.com]
> *Sent:* Monday, November 16, 2015 9:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: handling down node cassandra 2.0.15
>
> nope its not
>
> On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> Is that a seed node?
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:
>
> Hello,
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
> I tried to restart cassandra by following the steps @
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
> Can someone please help me if there is something i am doing wrong here.
>
> Thanks for the help in advance.
>
> Regards,
> Anishek
>
>
>
>
>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
Hey Josh

I did set the replace address which was same as the address of the machine
which went down so it was in place.

anishek

On Mon, Nov 16, 2015 at 10:33 PM, Josh Smith 
wrote:

> Sis you set the JVM_OPTS to replace address? That is usually the error I
> get when I forget to set the replace_address on Cassandra-env.
>
>
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
>
>
>
>
>
> *From:* Anishek Agarwal [mailto:anis...@gmail.com]
> *Sent:* Monday, November 16, 2015 9:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: handling down node cassandra 2.0.15
>
>
>
> nope its not
>
>
>
> On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> Is that a seed node?
>
>
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:
>
> Hello,
>
>
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
>
>
> I tried to restart cassandra by following the steps @
>
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
>
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
>
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
>
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
>
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
>
>
> Can someone please help me if there is something i am doing wrong here.
>
>
>
> Thanks for the help in advance.
>
>
>
> Regards,
>
> Anishek
>
>
>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anuj Wadehra
Sis you set the JVM_OPTS to replace address? That is usually the error I get 
when I forget to set the replace_address on Cassandra-env.

 

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node

 

 

From: Anishek Agarwal [mailto:anis...@gmail.com] 
Sent: Monday, November 16, 2015 9:25 AM
To: user@cassandra.apache.org
Subject: Re: handling down node cassandra 2.0.15

 

nope its not

 

On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi 
 wrote:

Is that a seed node?

 

On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:

Hello,

 

We are having a 3 node cluster and one of the node went down due to a hardware 
memory failure looks like. We followed the steps below after the node was down 
for more than the default value of max_hint_window_in_ms 

 

I tried to restart cassandra by following the steps @

 

http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
 
http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 

except the "clear data" part as it was not specified in second blog above.

 

i was trying to restart the same node that went down, however I did not get the 
messages in log files as stated in 2 against "StorageService"

 

instead it just tried to replay and then stopped with the error message as 
below:

 

ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584) Exception 
encountered during startup

java.lang.RuntimeException: Cannot replace address with a node that is already 
bootstrapped

 

Can someone please help me if there is something i am doing wrong here. 

 

Thanks for the help in advance. 

 

Regards,

Anishek 

 





Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anuj Wadehra
Hi Abhishek,
In my opinion, you already have data and bootstrapping is not needed here. You 
can set auto_bootstrap to false in Cassandra.yaml and once the cassandra is 
rebooted, you should run repair to fix the inconsistent data.

ThanksAnuj
 


 On Monday, 16 November 2015 10:34 PM, Josh Smith 
 wrote:
   

 #yiv1301064707 -- filtered {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 
4;}#yiv1301064707 filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv1301064707 
filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv1301064707 
filtered {font-family:Consolas;panose-1:2 11 6 9 2 2 4 3 2 4;}#yiv1301064707 
p.yiv1301064707MsoNormal, #yiv1301064707 li.yiv1301064707MsoNormal, 
#yiv1301064707 div.yiv1301064707MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv1301064707 a:link, 
#yiv1301064707 span.yiv1301064707MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv1301064707 a:visited, #yiv1301064707 
span.yiv1301064707MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv1301064707 p 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv1301064707 pre 
{margin:0in;margin-bottom:.0001pt;font-size:10.0pt;}#yiv1301064707 
span.yiv1301064707EmailStyle18 {color:#1F497D;}#yiv1301064707 
span.yiv1301064707HTMLPreformattedChar {}#yiv1301064707 
.yiv1301064707MsoChpDefault {}#yiv1301064707 filtered {margin:1.0in 1.0in 1.0in 
1.0in;}#yiv1301064707 div.yiv1301064707WordSection1 {}#yiv1301064707 filtered 
{}#yiv1301064707 ol {margin-bottom:0in;}#yiv1301064707 ul 
{margin-bottom:0in;}#yiv1301064707 Sis you set the JVM_OPTS to replace address? 
That is usually the error I get when I forget to set the replace_address on 
Cassandra-env.    JVM_OPTS="$JVM_OPTS 
-Dcassandra.replace_address=address_of_dead_node       From: Anishek Agarwal 
[mailto:anis...@gmail.com]
Sent: Monday, November 16, 2015 9:25 AM
To: user@cassandra.apache.org
Subject: Re: handling down node cassandra 2.0.15    nope its not    On Mon, Nov 
16, 2015 at 5:48 PM, sai krishnam raju potturi  wrote: 
Is that a seed node?    On Mon, Nov 16, 2015, 05:21 Anishek Agarwal 
 wrote: 
Hello,    We are having a 3 node cluster and one of the node went down due to a 
hardware memory failure looks like. We followed the steps below after the node 
was down for more than the default value of max_hint_window_in_ms     I tried 
to restart cassandra by following the steps @   
   - 
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
   - 
http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 except the "clear data" part as it was not specified in second blog above.    
i was trying to restart the same node that went down, however I did not get the 
messages in log files as stated in 2 against "StorageService"    instead it 
just tried to replay and then stopped with the error message as below:    ERROR 
[main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584) Exception 
encountered during startup java.lang.RuntimeException: Cannot replace address 
with a node that is already bootstrapped    Can someone please help me if there 
is something i am doing wrong here.     Thanks for the help in advance.     
Regards, Anishek  

   

  

RE: handling down node cassandra 2.0.15

2015-11-16 Thread Josh Smith
Sis you set the JVM_OPTS to replace address? That is usually the error I get 
when I forget to set the replace_address on Cassandra-env.

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node


From: Anishek Agarwal [mailto:anis...@gmail.com]
Sent: Monday, November 16, 2015 9:25 AM
To: user@cassandra.apache.org
Subject: Re: handling down node cassandra 2.0.15

nope its not

On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi 
mailto:pskraj...@gmail.com>> wrote:

Is that a seed node?

On Mon, Nov 16, 2015, 05:21 Anishek Agarwal 
mailto:anis...@gmail.com>> wrote:
Hello,

We are having a 3 node cluster and one of the node went down due to a hardware 
memory failure looks like. We followed the steps below after the node was down 
for more than the default value of max_hint_window_in_ms

I tried to restart cassandra by following the steps @


  1.  
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
  2.  
http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
except the "clear data" part as it was not specified in second blog above.

i was trying to restart the same node that went down, however I did not get the 
messages in log files as stated in 2 against "StorageService"

instead it just tried to replay and then stopped with the error message as 
below:

ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584) Exception 
encountered during startup
java.lang.RuntimeException: Cannot replace address with a node that is already 
bootstrapped

Can someone please help me if there is something i am doing wrong here.

Thanks for the help in advance.

Regards,
Anishek



Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
nope its not

On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> Is that a seed node?
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:
>
>> Hello,
>>
>> We are having a 3 node cluster and one of the node went down due to a
>> hardware memory failure looks like. We followed the steps below after the
>> node was down for more than the default value of *max_hint_window_in_ms*
>>
>> I tried to restart cassandra by following the steps @
>>
>>
>>1.
>>
>> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>>2.
>>
>> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>>
>> *except the "clear data" part as it was not specified in second blog
>> above.*
>>
>> i was trying to restart the same node that went down, however I did not
>> get the messages in log files as stated in 2 against "StorageService"
>>
>> instead it just tried to replay and then stopped with the error message
>> as below:
>>
>> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
>> Exception encountered during startup*
>> *java.lang.RuntimeException: Cannot replace address with a node that is
>> already bootstrapped*
>>
>> Can someone please help me if there is something i am doing wrong here.
>>
>> Thanks for the help in advance.
>>
>> Regards,
>> Anishek
>>
>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread sai krishnam raju potturi
Is that a seed node?

On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:

> Hello,
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
> I tried to restart cassandra by following the steps @
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
> Can someone please help me if there is something i am doing wrong here.
>
> Thanks for the help in advance.
>
> Regards,
> Anishek
>