Re: one node down and cluster works better

2020-04-13 Thread Osman Yozgatlıoğlu
Thanks Mehmet and Erick, I don't have any monitoring other than nodetool but I manage to see some disk errors cause exceptions. I changed faulty disk and performance ok now. Regards, Osman On Sun, 5 Apr 2020 at 03:17, Erick Ramirez wrote: > > With only 2 replicas per DC, it means you're likely

Re: one node down and cluster works better

2020-04-04 Thread Erick Ramirez
With only 2 replicas per DC, it means you're likely writing with a consistency level of either ONE or LOCAL_ONE. Everytime you hit the problematic node, the write performance drops. All other configurations being equal, this indicates an issue with the commitlog disk on the node. Get your sysadmin

Re: one node down and cluster works better

2020-04-04 Thread mehmet bursali
Hi Osman,Do you use any monitoring solution such as prometheus on your cluster?  If yes, you should install and use cassandra exporter from the link below and examine some detailed metrics.https://github.com/criteo/cassandra_exporter   ndroid’de Yahoo Postadan gönderildi 15:53’’4e’ 4 Nis 2020

one node down and cluster works better

2020-04-04 Thread Osman Yozgatlıoğlu
Hello, I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2 My insertion performance dropped somehow. I restarted nodes one by one and found one node degrades performance. Verified this node after problem occurs a couple of times. How can I continue to investigate? Regards,

Cassandra node down metric

2019-07-30 Thread Rahul Reddy
Hello, I'm using jmx metric node org_apache_cassandra_net_failuredetector_downendpointcount to monitor number of Cassandra nodes down. For any reason (aws schedule retirement) we decommission Cassandra node this metric shows the node down for 72 hours until the gossip clearead. We want

RE: Jmx metrics shows node down

2019-07-29 Thread ZAIDI, ASAD A
9, 2019 10:56 AM To: user@cassandra.apache.org Subject: Re: Jmx metrics shows node down Is there workaround to shorten 72 hours to something shorter?(you said by default, wondering if one can set a non-default value?) Thanks, Yuping On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin mailto:olek

Re: Jmx metrics shows node down

2019-07-29 Thread yuping wang
Is there workaround to shorten 72 hours to something shorter?(you said by default, wondering if one can set a non-default value?) Thanks, Yuping On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin wrote: > On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy wrote: > > Decommissioned 2 nodes from clust

Re: Jmx metrics shows node down

2019-07-29 Thread yuping wang
We have the same issue. We observed the JMX only cleared after exactly 72 hours too. On Jul 29, 2019, at 11:23 AM, Rahul Reddy wrote: And also system.peers table doesn't have the information on old nodes only ghost nodes to be there in JMX > On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy wrote:

Re: Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
And also system.peers table doesn't have the information on old nodes only ghost nodes to be there in JMX On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy wrote: > We removed many times nodes from a cluster but never seen the jmx metric > down stay for 72 hours. So it has to be completely removed from

Re: Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
We removed many times nodes from a cluster but never seen the jmx metric down stay for 72 hours. So it has to be completely removed from gossip to show the metric as expected? This would be problem for using the metric to alert on call On Mon, Jul 29, 2019, 7:28 AM Oleksandr Shulgin < oleksandr.sh

Re: Jmx metrics shows node down

2019-07-29 Thread Oleksandr Shulgin
On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy wrote: > > Decommissioned 2 nodes from cluster nodetool status doesn't list the > nodes as expected but jmx metrics shows still those 2 nodes has down. > Nodetool gossip shows the 2 nodes in Left state. Why does my jmx still > shows those nodes down ev

Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
Hello, Decommissioned 2 nodes from cluster nodetool status doesn't list the nodes as expected but jmx metrics shows still those 2 nodes has down. Nodetool gossip shows the 2 nodes in Left state. Why does my jmx still shows those nodes down even after 24 hours. Cassandra version 3.11.3 ? Anything

2.1 cassandra 1 node down produces replica shortfall

2019-05-17 Thread Carl Mueller
Being one of our largest and unfortunately heaviest multi-tenant clusters, and our last 2.1 prod cluster, we are encountering not enough replica errors (need 2, only found 1) after only bringing down 1 node. 90 node cluster, 30/dc, dcs are in europe, asia, and us. AWS. Are there bugs for erroneous

Re: cqlsh COPY ... TO ... doesn't work if one node down

2018-07-01 Thread @Nandan@
CQL Copy command will not work in case if you are trying to copy from all NODES because COPY command will check all N nodes UP and RUNNING Status. If you want to complete then you have 2 options:- 1) Remove DOWN NODE from COPY command 2) Make it UP and NORMAL status. On Mon, Jul 2, 2018 at 9:15

Re: cqlsh COPY ... TO ... doesn't work if one node down

2018-07-01 Thread Anup Shirolkar
Hi, The error shows that, the cqlsh connection with down node is failed. So, you should debug why it happened. Although, you have mentioned other node in cqlsh command '10.0.0.154' my guess is, the down node was present in connection pool, hence it was attempted for connection. Ideally the avail

cqlsh COPY ... TO ... doesn't work if one node down

2018-06-29 Thread Dmitry Simonov
Hello! I have cassandra cluster with 5 nodes. There is a (relatively small) keyspace X with RF5. One node goes down. Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.82 253.64 M

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
@cassandra.apache.org Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster If you think that will fix the problem, maybe you could add a little more memory to each machine as a short term fix. From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Wednesday, March 28, 2018 5:24 AM To: user

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
If you think that will fix the problem, maybe you could add a little more memory to each machine as a short term fix. From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Wednesday, March 28, 2018 5:24 AM To: user@cassandra.apache.org Subject: 答复: 答复: 答复: A node down every day in a 6 nodes

答复: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Xiangfei Ni
Brotman 发送时间: 2018年3月28日 20:16 收件人: user@cassandra.apache.org 主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster David, Did you figure out what to do about the data model problem? It could be that your data files finally grow to the point that the data model problem caused the Java heap

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
model. Kenneth Brotman From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] Sent: Wednesday, March 28, 2018 4:46 AM To: 'user@cassandra.apache.org' Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster Was any change to hardware done around the time the problem star

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
-dt.com] Sent: Wednesday, March 28, 2018 4:40 AM To: user@cassandra.apache.org Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster Hi Kenneth, The cluster has been running for 4 months, The problem occurred from last week, Best Regards, 倪项菲/ David Ni 中移德电网络科技有限公司

答复: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Xiangfei Ni
: Kenneth Brotman 发送时间: 2018年3月28日 19:34 收件人: user@cassandra.apache.org 主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster David, How long has the cluster been operating? How long has the problem been occurring? Kenneth Brotman From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Tuesday, March 27

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
David, How long has the cluster been operating? How long has the problem been occurring? Kenneth Brotman From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Tuesday, March 27, 2018 7:00 PM To: Xiangfei Ni Cc: user@cassandra.apache.org Subject: Re: 答复: 答复: A node down every day in a 6

Re: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Jeff Jirsa
Virtue Intelligent Network Ltd, co. > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei > Mob: +86 13797007811|Tel: + 86 27 5024 2516 > > 发件人: Jeff Jirsa > 发送时间: 2018年3月27日 11:03 > 收件人: user@cassandra.apache.org > 主题: Re: A node down every day in

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516 发件人: Xiangfei Ni 发送时间: 2018年3月28日 9:45 收件人: Jeff Jirsa 抄送: user@cassandra.apache.org 主题: 答复: 答复: A node down every day in a 6 nodes cluster Hi Jeff, Today another node was shutdown,I have attached the exception log file,could you

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
: + 86 27 5024 2516 发件人: Jeff Jirsa 发送时间: 2018年3月27日 11:50 收件人: Xiangfei Ni 抄送: user@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improperly pooling connections, or you have a hardware

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Rahul Singh
m/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html > > Kenneth Brotman > > > From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] > Sent: Tuesday, March 27, 2018 5:44 AM > To: user@cassandra.apache.org > Subject: Re:RE: 答复: A node down every day in a 6 nodes clu

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
/operations/opsReplaceLiveNode.html Kenneth Brotman From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 5:44 AM To: user@cassandra.apache.org Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster Thanks,Kenneth,this is production database,and it is

Re:RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
[mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 3:27 AM To: Jeff Jirsa Cc: user@cassandra.apache.org Subject: 答复: 答复: A node down every day in a 6 nodes cluster Thanks Jeff, So your suggestion is to first resolve the data model issue which cause wide partition,right? Best Regards

RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
David, Can you replace the misbehaving node to see if that resolves the problem? Kenneth Brotman From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 3:27 AM To: Jeff Jirsa Cc: user@cassandra.apache.org Subject: 答复: 答复: A node down every day in a 6 nodes

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
5024 2516 发件人: Jeff Jirsa 发送时间: 2018年3月27日 11:50 收件人: Xiangfei Ni 抄送: user@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improperly pooling connections, or you have a hardware problem. I

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improperly pooling connections, or you have a hardware problem. I dont see anything in nodetool that explains it, though you certainly have a data

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread Xiangfei Ni
27 5024 2516 发件人: daemeon reiydelle 发送时间: 2018年3月27日 11:42 收件人: user 主题: Re: 答复: A node down every day in a 6 nodes cluster Look for errors on your network interface. I think you have periodic errors in your network connectivity <==> "Who do you think made the first stone

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread Jeff Jirsa
+86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516 > <+86%2027%205024%202516> > > > > *发件人:* Jeff Jirsa > *发送时间:* 2018年3月27日 11:03 > *收件人:* user@cassandra.apache.org > *主题:* Re: A node down every day in a 6 nodes cluster > > > > That wa

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread daemeon reiydelle
lt;+86%2027%205024%202516> > > > > *发件人:* Jeff Jirsa > *发送时间:* 2018年3月27日 11:03 > *收件人:* user@cassandra.apache.org > *主题:* Re: A node down every day in a 6 nodes cluster > > > > That warning isn’t sufficient to understand why the node is going down > > >

Re: A node down every day in a 6 nodes cluster

2018-03-26 Thread Jeff Jirsa
That warning isn’t sufficient to understand why the node is going down Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is likely a good idea Are the nodes coming up on their own? Or are you restarting them? Paste the output of nodetool tpstats and nodetool cfstats

A node down every day in a 6 nodes cluster

2018-03-26 Thread Xiangfei Ni
Hi Cassandra experts, I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster is just in one DC, Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 3,a node downs one tim

Re: Not marking node down due to local pause

2017-10-20 Thread Alexander Dejanovski
Hi John, the other main source of STW pause in the JVM is the safepoint mechanism : http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html If you turn on full GC logging in your cassandra-env.sh file, you will find lines like this : 2017-10-09T20:13:42.462+: 4.890: Total time for wh

Not marking node down due to local pause

2017-10-19 Thread John Sanda
I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a lot of these messages in both logs: WARN 07:23:16 Not marking nodes down due to local pause of 7219277694 > 50 I am fairly certain that they are not due to GC. I am not seeing a whole of GC being logged and nothing o

Re: Node down during move

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky wrote: > just a follow up. We've seen this behavior multiple times now. It seems > that the receiving node loses connectivity to the cluster and thus > thinks that it is the sole online node, whereas the rest of the cluster > thinks that it is the only

Re: Node down during move

2014-12-23 Thread Jiri Horky
Hi, just a follow up. We've seen this behavior multiple times now. It seems that the receiving node loses connectivity to the cluster and thus thinks that it is the sole online node, whereas the rest of the cluster thinks that it is the only offline node, really just after the streaming is over. I

Node down during move

2014-12-19 Thread Jiri Horky
Hi list, we added a new node to existing 8-nodes cluster with C* 1.2.9 without vnodes and because we are almost totally out of space, we are shuffling the token fone node after another (not in parallel). During one of this move operations, the receiving node died and thus the streaming failed: W

Re: node down = log explosion?

2013-01-23 Thread aaron morton
t; The number of in flight hints is greater than… >> >>private static volatile int maxHintsInProgress = 1024 * >> Runtime.getRuntime().availableProcessors(); >> >> You may be able to work around this by reducing the max_hint_window_in_ms >> in the yaml file

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
s(); > > You may be able to work around this by reducing the max_hint_window_in_ms > in the yaml file so that hints are recorded if say the node has been down > for more than 1 minute. > > Anyways I would say your test showed that the current cluster does not > have sufficie

Re: node down = log explosion?

2013-01-22 Thread aaron morton
Anyways I would say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use nodes with more cores, adjust the HH settings, or reduce the throughput. >>

Re: node down = log explosion?

2013-01-22 Thread Rob Coli
On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir wrote: > Do you have a suggestion as to what could be a better fit for counters? > Something that can also replicate across DCs and survive link breakdown > between nodes (across DCs)? (and no, I don't need 100.00% precision > (although it would be ni

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
> >> We wanted to test what happens if one node goes down, so we brought one >> node >> down in DC1 (i.e. the node that was handling half of the incoming >> writes). >> ... >> This led to a complete explosion of logs on the remaining alive node in >

Re: node down = log explosion?

2013-01-22 Thread Rob Coli
l. > We wanted to test what happens if one node goes down, so we brought one node > down in DC1 (i.e. the node that was handling half of the incoming writes). > ... > This led to a complete explosion of logs on the remaining alive node in DC1. I agree, this level of exception logging dur

node down = log explosion?

2013-01-22 Thread Sergey Olefir
backup). In total there's 100 separate clients executing 1-2 batch updates per second. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). This led to a complete explosion of logs on the remaining

Re: Node down

2012-02-02 Thread aaron morton
on? > > Thanks! > > Rene > > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: woensdag 1 februari 2012 21:03 > To: user@cassandra.apache.org > Subject: Re: Node down > > Without knowing too much more information I would try this… > > * R

RE: Node down

2012-02-02 Thread Rene Kochen
ring view". Can it be that this stored ring view was out of sync with the actual (gossip) situation? Thanks! Rene From: aaron morton [mailto:aa...@thelastpickle.com] Sent: woensdag 1 februari 2012 21:03 To: user@cassandra.apache.org Subject: Re: Node down Without knowing too much more inf

Re: Node down

2012-02-01 Thread aaron morton
Without knowing too much more information I would try this… * Restart node each node in turn, watch the logs to see what it says about the other. * If that restart did not fix it, try using the Dcassandra.load_ring_state=false JVM option when starting the node. That will tell it to ignore it'

Node down

2012-02-01 Thread Rene Kochen
I have a cluster with seven nodes. If I run the node-tool ring command on all nodes, I see the following: Node1 says that node2 is down. Node 2 says that node1 is down. All other nodes say that everyone is up. Is this normal behavior? I see no network related problems. Also no problems between

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
>  Thank you for your explanations. Even with a RF=1 and one node down I don't > understand why I can't at least read the data in the nodes that are still > up? You will be able to read data for row keys that do not live on the node that is down. But for any request to a row w

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Alexandru Dan Sicoe
Hi Peter, Thank you for your explanations. Even with a RF=1 and one node down I don't understand why I can't at least read the data in the nodes that are still up? Also, why can't I at least perform writes with consistency level ANY and failover policy ON_FAIL_TRY_ALL_AVAILABLE.

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
> If you want to survive node failures, use an RF above 1. And then make > sure to use an appropriate consistency level. To elaborate a bit: RF, or replication factor, is the *total* number of copies of any piece of data in the cluster. So with only one copy, the data will not be available when a

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
> took a node down to see how it behaves. All of a sudden I couldn't write or [snip] > me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be [snip] >     Default replication factor = 1 So you have an RF=1 cluster (only one copy of data) and you bring a n

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Alexandru Dan Sicoe
Hi guys, It's interesting to see this thread. I recently discovered a similar problem on my 3 node Cassandra 0.8.5 cluster. It was working fine, then I took a node down to see how it behaves. All of a sudden I couldn't write or read because of this exception being thrown: Exception

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread R. Verlangen
I'm currently having a similar problem with a 2-node cluster. When 1 > shutdown > >> one of the nodes, the other isn't responding any more. > >> > >> Did you found a solution for your problem? > >> > >> /I'm new to mailing lists, if i

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Jonathan Ellis
problem? >> >> /I'm new to mailing lists, if it's inappropriate to reply here, please let >> me know../ >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> http://cassandra-user-inc

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Javier Canillas
t; /I'm new to mailing lists, if it's inappropriate to reply here, please > let > > me know../ > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > > > http://cassandra-user-incubator-

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Jonathan Ellis
the nodes, the other isn't responding any more. > > Did you found a solution for your problem? > > /I'm new to mailing lists, if it's inappropriate to reply here, please let > me know../ > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
Thank you very much Jake! It solved the problem. All reads and writes are working now. Have a nice day! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936947.html Sent from the cassandra-u

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread Jake Luciani
e in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936912.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- http://twitter.com/tjake

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
I'm reading with: cassandra_ConsistencyLevel::ANY (phpcassa lib). Is there any way to verify that all the nodes know that they are RF=2 ? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-fa

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread Jake Luciani
exception > 'cassandra_UnavailableException' > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936869.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at >

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
The error I currently see when I take down node B: Error performing get_indexed_slices on NODE A IP:9160: exception 'cassandra_UnavailableException' -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overa

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread Jake Luciani
82254385124880979556330753059704699 > IP-Of-Node-Adatacenter1 rack1 Up Normal 2.73 MB > 55.00% 167057712653383445280042298172156091026 > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-do

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread RobinUs2
andra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html -- View this message in context: http://cassandra-user-incubato

2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
80042298172156091026 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936722.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
And fixed! a co-worker put in a bad host line entry last night that through it all off :( Thanks for the assist guys. -- Ray Slakinski On Wednesday, July 13, 2011 at 1:32 PM, Ray Slakinski wrote: > Was all working before, but we ran out of file handles and ended up > restarting the nodes. No

Re: One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
Was all working before, but we ran out of file handles and ended up restarting the nodes. No yaml changes have occurred. Ray Slakinski On 2011-07-13, at 12:55 PM, Sasha Dolgy wrote: > any firewall changes? ping is fine ... but if you can't get from > node(a) to nodes(n) on the specific ports

Re: One node down but it thinks its fine...

2011-07-13 Thread Sasha Dolgy
any firewall changes? ping is fine ... but if you can't get from node(a) to nodes(n) on the specific ports... On Wed, Jul 13, 2011 at 6:47 PM, samal wrote: > Check seed ip is same in all node and should not be loopback ip on cluster. > > On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski > wrote: >

Re: One node down but it thinks its fine...

2011-07-13 Thread samal
Check seed ip is same in all node and should not be loopback ip on cluster. On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski wrote: > One of our nodes, which happens to be the seed thinks its Up and all the > other nodes are down. However all the other nodes thinks the seed is down > instead. The l

One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
One of our nodes, which happens to be the seed thinks its Up and all the other nodes are down. However all the other nodes thinks the seed is down instead. The logs for the seed node show everything is running as it should be. I've tried restarting the node, turning on/off gossip and thrift and

RE: Reboot, now node down 0.8rc1

2011-05-24 Thread Scott McPheeters
; Sent: Monday, May 23, 2011 6:42 PM > To: user@cassandra.apache.org > Subject: Re: Reboot, now node down 0.8rc1 > > You could have removed the affected commit log file and then run a > nodetool repair after the node had started. > > It would be handy to have some more context for the proble

Re: Reboot, now node down 0.8rc1

2011-05-24 Thread Sylvain Lebresne
3, 2011 6:42 PM > To: user@cassandra.apache.org > Subject: Re: Reboot, now node down 0.8rc1 > > You could have removed the affected commit log file and then run a > nodetool repair after the node had started. > > It would be handy to have some more context for the problem. Was this

RE: Reboot, now node down 0.8rc1

2011-05-24 Thread Scott McPheeters
@cassandra.apache.org Subject: Re: Reboot, now node down 0.8rc1 You could have removed the affected commit log file and then run a nodetool repair after the node had started. It would be handy to have some more context for the problem. Was this an upgrade from 0.7 or a fresh install? If you are

Re: Reboot, now node down 0.8rc1

2011-05-23 Thread aaron morton
letely > what the commitlog is? > > > Scott > > > -Original Message- > From: Scott McPheeters [mailto:smcpheet...@healthx.com] > Sent: Monday, May 23, 2011 2:18 PM > To: user@cassandra.apache.org > Subject: Reboot, now node down 0.8rc1 > > I have a

RE: Reboot, now node down 0.8rc1

2011-05-23 Thread Scott McPheeters
n the node and bring it back? Or am I missing completely what the commitlog is? Scott -Original Message- From: Scott McPheeters [mailto:smcpheet...@healthx.com] Sent: Monday, May 23, 2011 2:18 PM To: user@cassandra.apache.org Subject: Reboot, now node down 0.8rc1 I have a test node s

Reboot, now node down 0.8rc1

2011-05-23 Thread Scott McPheeters
I have a test node system running release 0.8rc1. I rebooted node3 and now Cassandra is failing on startup. Any ideas? I am not sure where to begin. Debian 6, plenty of disk space, Cassandra 0.8rc1 INFO 13:48:58,192 Creating new commitlog segment /home/cassandra/commitlog/CommitLog-130617293

Re: Determining the issues of marking node down

2011-04-30 Thread aaron morton
If the node is crashing with OutOfMemory it will be in the cassandra logs. Search them for "ERROR". Alternatively if you've installed a package the stdout and stderr may be redirected to a file called something like output.log in the same location as the log file. You can change the logging usi

Determining the issues of marking node down

2011-04-30 Thread Rauan Maemirov
I have a test cluster with 3 nodes, earlier I've installed OpsCenter to watch my cluster. Every day I see, that the same one node goes down (at different time, but every day). Then I just run `service cassandra start` to fix that problem. system.log doesn't show me anything strange. What are the st

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
eport it being down.  We are running a 9 node cluster with RF=3, all reads and writes at quorum.  I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since the other two nodes would be able to respond. JustinOn Wed, Sep 29, 2010 at 5:

Re: Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
It seems to be about 15 seconds after killing a node before the other nodes report it being down. We are running a 9 node cluster with RF=3, all reads and writes at quorum. I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since the

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
Ah, that was not exactly what you were after. I do not know how long it takes gossip / failure detector to detect a down node. In your case what is the CF you're using for reads and what is your RF? The hope would be that taking one node down at a time would leave enough server running to

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
:15 AM, Justin Sanders wrote:I looked through the documentation but couldn't find anything.  I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up. The

Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
I looked through the documentation but couldn't find anything. I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up. The reason I ask is because w

Re: node down window

2010-07-14 Thread Jonathan Ellis
Coordination in a distributed system is difficult. I don't think we can fix HH's existing edge cases, without introducing other more complicated edge cases. So weekly-or-so repair will remain a common maintenance task for the forseeable future. On Wed, Jul 14, 2010 at 4:17 PM, B. Todd Burruss w

Re: node down window

2010-07-14 Thread B. Todd Burruss
thx, but disappointing :) is this just something we have to live with and periodically "repair" the nodes? or is there future work to tighten up the window? thx On Wed, 2010-07-14 at 12:13 -0700, Jonathan Ellis wrote: > On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss wrote: > > there is a wi

Re: node down window

2010-07-14 Thread Jonathan Ellis
On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss wrote: > there is a window of time from when a node goes down and when the rest > of the cluster actually realizes that it is down. > > what happens to writes during this time frame?  does hinted handoff > record these writes and then "handoff" when

node down window

2010-07-14 Thread B. Todd Burruss
there is a window of time from when a node goes down and when the rest of the cluster actually realizes that it is down. what happens to writes during this time frame? does hinted handoff record these writes and then "handoff" when the down node returns? or does hinted handoff not kick in until

Re: UnavailableException with 1 node down and RF=2?

2010-07-01 Thread Jonathan Ellis
; > >> > Sent from my iPhone. >> > >> > On 2010-07-01, at 1:39 AM, Benjamin Black wrote: >> > >> >> .QUORUM or .ALL (they are the same with RF=2). >> >> >> >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick >> >> wrote: >>

Re: UnavailableException with 1 node down and RF=2?

2010-07-01 Thread James Golick
> > Oops. I meant to say that I'm reading with CL.ONE. > > > > J. > > > > Sent from my iPhone. > > > > On 2010-07-01, at 1:39 AM, Benjamin Black wrote: > > > >> .QUORUM or .ALL (they are the same with RF=2). > >> > >>

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread Jonathan Ellis
Black wrote: > >> .QUORUM or .ALL (they are the same with RF=2). >> >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: >>> 4 nodes, RF=2, 1 node down. >>> How can I get an UnavailableException in that scenario? >>> - J. > -- Jonatha

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread James Golick
Oops. I meant to say that I'm reading with CL.ONE. J. Sent from my iPhone. On 2010-07-01, at 1:39 AM, Benjamin Black wrote: > .QUORUM or .ALL (they are the same with RF=2). > > On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: >> 4 nodes, RF=2, 1 node down.

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread Benjamin Black
.QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: > 4 nodes, RF=2, 1 node down. > How can I get an UnavailableException in that scenario? > - J.

UnavailableException with 1 node down and RF=2?

2010-06-30 Thread James Golick
4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J.