Re: Cluster not accepting insert while one node is down

2013-02-14 Thread Alain RODRIGUEZ
Hi Traian,

There is your problem. You are using RF=1, meaning that each node is
responsible for its range, and nothing more. So when a node goes down, do
the math, you just can't read 1/5 of your data.

This is very cool for performances since each node owns its own part of the
data and any write or read need to reach only one node, but it removes the
SPOF, which is a main point of using C*. So you have poor availability and
poor consistency.

An usual configuration with 5 nodes would be RF=3 and both CL (RW) =
QUORUM.

This will replicate your data to 2 nodes + the natural endpoints (total of
3/5 nodes owning any data) and any read or write would need to reach at
least 2 nodes before being considered as being successful ensuring a strong
consistency.

This configuration allow you to shut down a node (crash or configuration
update/rolling restart) without degrading the service (at least allowing
you to reach any data) but at cost of more data on each node.

Alain


2013/2/14 Traian Fratean traian.frat...@gmail.com

 I am using defaults for both RF and CL. As the keyspace was created using
 cassandra-cli the default RF should be 1 as I get it from below:

 [default@TestSpace] describe;
 Keyspace: TestSpace:
   Replication Strategy:
 org.apache.cassandra.locator.NetworkTopologyStrategy
   Durable Writes: true
 Options: [datacenter1:1]

 As for the CL it the Astyanax default, which is 1 for both reads and
 writes.

 Traian.


 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com

 We probably need more info like the RF of your cluster and CL of your
 reads and writes. Maybe could you also tell us if you use vnodes or not.

 I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
 better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
 C*1.2.

 Alain


 2013/2/13 Traian Fratean traian.frat...@gmail.com

 Hi,

 I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
 client with Astyanax 1.56.21.
 When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
 below) went down I get TokenRandeOfflineException and no other data gets
 inserted into *any other* node from the cluster.

 Am I having a configuration issue or this is supposed to happen?


 com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
 -
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
  at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
  at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
  at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)



 Thank you,
 Traian.






Re: Cluster not accepting insert while one node is down

2013-02-14 Thread Alain RODRIGUEZ
I will let commiters or anyone that has knowledge on Cassandra internal
answer this.

From what I understand, you should be able to insert data on any up node
with your configuration...

Alain


2013/2/14 Traian Fratean traian.frat...@gmail.com

 You're right as regarding data availability on that node. And my config,
 being the default one, is not suited for a cluster.
 What I don't get is that my 67 node was down and I was trying to insert in
 66 node, as can be seen from the stacktrace. Long story short: when node 67
 was down I could not insert into any machine in the cluster. Not what I was
 expecting.

 Thank you for the reply!
 Traian.

 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com

 Hi Traian,

 There is your problem. You are using RF=1, meaning that each node is
 responsible for its range, and nothing more. So when a node goes down, do
 the math, you just can't read 1/5 of your data.

 This is very cool for performances since each node owns its own part of
 the data and any write or read need to reach only one node, but it removes
 the SPOF, which is a main point of using C*. So you have poor availability
 and poor consistency.

 An usual configuration with 5 nodes would be RF=3 and both CL (RW) =
 QUORUM.

 This will replicate your data to 2 nodes + the natural endpoints (total
 of 3/5 nodes owning any data) and any read or write would need to reach at
 least 2 nodes before being considered as being successful ensuring a strong
 consistency.

 This configuration allow you to shut down a node (crash or configuration
 update/rolling restart) without degrading the service (at least allowing
 you to reach any data) but at cost of more data on each node.

 Alain


 2013/2/14 Traian Fratean traian.frat...@gmail.com

 I am using defaults for both RF and CL. As the keyspace was created
 using cassandra-cli the default RF should be 1 as I get it from below:

 [default@TestSpace] describe;
 Keyspace: TestSpace:
   Replication Strategy:
 org.apache.cassandra.locator.NetworkTopologyStrategy
   Durable Writes: true
 Options: [datacenter1:1]

 As for the CL it the Astyanax default, which is 1 for both reads and
 writes.

 Traian.


 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com

 We probably need more info like the RF of your cluster and CL of your
 reads and writes. Maybe could you also tell us if you use vnodes or not.

 I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
 better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
 C*1.2.

 Alain


 2013/2/13 Traian Fratean traian.frat...@gmail.com

 Hi,

 I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
 client with Astyanax 1.56.21.
 When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
 below) went down I get TokenRandeOfflineException and no other data gets
 inserted into *any other* node from the cluster.

 Am I having a configuration issue or this is supposed to happen?


 com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
 -
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
  at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
  at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
  at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)



 Thank you,
 Traian.








Re: Cluster not accepting insert while one node is down

2013-02-14 Thread Bryan Talbot
Generally data isn't written to whatever node the client connects to.  In
your case, a row is written to one of the nodes based on the hash of the
row key.  If that one replica node is down, it won't matter which
coordinator node you attempt a write with CL.ONE: the write will fail.

If you want the write to succeed, you could do any one of: write with
CL.ANY, increase RF to 2+, write using a row key that hashes to an UP node.

-Bryan



On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 I will let commiters or anyone that has knowledge on Cassandra internal
 answer this.

 From what I understand, you should be able to insert data on any up node
 with your configuration...

 Alain


 2013/2/14 Traian Fratean traian.frat...@gmail.com

 You're right as regarding data availability on that node. And my config,
 being the default one, is not suited for a cluster.
 What I don't get is that my 67 node was down and I was trying to insert
 in 66 node, as can be seen from the stacktrace. Long story short: when node
 67 was down I could not insert into any machine in the cluster. Not what I
 was expecting.

 Thank you for the reply!
 Traian.

 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com

 Hi Traian,

 There is your problem. You are using RF=1, meaning that each node is
 responsible for its range, and nothing more. So when a node goes down, do
 the math, you just can't read 1/5 of your data.

 This is very cool for performances since each node owns its own part of
 the data and any write or read need to reach only one node, but it removes
 the SPOF, which is a main point of using C*. So you have poor availability
 and poor consistency.

 An usual configuration with 5 nodes would be RF=3 and both CL (RW) =
 QUORUM.

 This will replicate your data to 2 nodes + the natural endpoints (total
 of 3/5 nodes owning any data) and any read or write would need to reach at
 least 2 nodes before being considered as being successful ensuring a strong
 consistency.

 This configuration allow you to shut down a node (crash or configuration
 update/rolling restart) without degrading the service (at least allowing
 you to reach any data) but at cost of more data on each node.

 Alain


 2013/2/14 Traian Fratean traian.frat...@gmail.com

 I am using defaults for both RF and CL. As the keyspace was created
 using cassandra-cli the default RF should be 1 as I get it from below:

 [default@TestSpace] describe;
 Keyspace: TestSpace:
   Replication Strategy:
 org.apache.cassandra.locator.NetworkTopologyStrategy
   Durable Writes: true
 Options: [datacenter1:1]

 As for the CL it the Astyanax default, which is 1 for both reads and
 writes.

 Traian.


 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com

 We probably need more info like the RF of your cluster and CL of your
 reads and writes. Maybe could you also tell us if you use vnodes or not.

 I heard that Astyanax was not running very smoothly on 1.2.0, but a
 bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
 C*1.2.

 Alain


 2013/2/13 Traian Fratean traian.frat...@gmail.com

 Hi,

 I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
 client with Astyanax 1.56.21.
 When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
 below) went down I get TokenRandeOfflineException and no other data gets
 inserted into *any other* node from the cluster.

 Am I having a configuration issue or this is supposed to happen?


 com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
 -
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
  at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
  at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
  at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)



 Thank you,
 Traian.









Re: Cluster not accepting insert while one node is down

2013-02-14 Thread Wei Zhu
From the exception, looks like astyanax didn't even try to call Cassandra. My 
guess would be astyanax is token aware, it detects the node is down and it 
doesn't even try. If you use Hector, it might try to write since it's not 
token aware. But As Byran said, it eventually will fail. I guess hinted hand 
off won't help since the write doesn't satisfy CL.ONE.



 From: Bryan Talbot btal...@aeriagames.com
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:30 AM
Subject: Re: Cluster not accepting insert while one node is down
 

Generally data isn't written to whatever node the client connects to.  In your 
case, a row is written to one of the nodes based on the hash of the row key.  
If that one replica node is down, it won't matter which coordinator node you 
attempt a write with CL.ONE: the write will fail.

If you want the write to succeed, you could do any one of: write with CL.ANY, 
increase RF to 2+, write using a row key that hashes to an UP node.

-Bryan




On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

I will let commiters or anyone that has knowledge on Cassandra internal answer 
this.


From what I understand, you should be able to insert data on any up node with 
your configuration...

Alain



2013/2/14 Traian Fratean traian.frat...@gmail.com

You're right as regarding data availability on that node. And my config, being 
the default one, is not suited for a cluster.
What I don't get is that my 67 node was down and I was trying to insert in 66 
node, as can be seen from the stacktrace. Long story short: when node 67 was 
down I could not insert into any machine in the cluster. Not what I was 
expecting.


Thank you for the reply!Traian.


2013/2/14 Alain RODRIGUEZ arodr...@gmail.com

Hi Traian,


There is your problem. You are using RF=1, meaning that each node is 
responsible for its range, and nothing more. So when a node goes down, do 
the math, you just can't read 1/5 of your data.


This is very cool for performances since each node owns its own part of the 
data and any write or read need to reach only one node, but it removes the 
SPOF, which is a main point of using C*. So you have poor availability and 
poor consistency.


An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM.


This will replicate your data to 2 nodes + the natural endpoints (total of 
3/5 nodes owning any data) and any read or write would need to reach at 
least 2 nodes before being considered as being successful ensuring a strong 
consistency.


This configuration allow you to shut down a node (crash or configuration 
update/rolling restart) without degrading the service (at least allowing you 
to reach any data) but at cost of more data on each node.

Alain



2013/2/14 Traian Fratean traian.frat...@gmail.com

I am using defaults for both RF and CL. As the keyspace was created using 
cassandra-cli the default RF should be 1 as I get it from below:


[default@TestSpace] describe;
Keyspace: TestSpace:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:1]


As for the CL it the Astyanax default, which is 1 for both reads and writes.

Traian.



2013/2/13 Alain RODRIGUEZ arodr...@gmail.com

We probably need more info like the RF of your cluster and CL of your reads 
and writes. Maybe could you also tell us if you use vnodes or not.


I heard that Astyanax was not running very smoothly on 1.2.0, but a bit 
better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for 
C*1.2.

Alain



2013/2/13 Traian Fratean traian.frat...@gmail.com

Hi,


I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java 
client with Astyanax 1.56.21.
When a node(10.60.15.67 - diiferent from the one in the stacktrace below) 
went down I get TokenRandeOfflineException and no other data gets 
inserted into any other node from the cluster.


Am I having a configuration issue or this is supposed to happen?




com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
 - 
com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, 
latency=2057(2057), attempts=1]UnavailableException()
com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, 
latency=2057(2057), attempts=1]UnavailableException()
at 
com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
at 
com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
at 
com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
at 
com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140

Re: Cluster not accepting insert while one node is down

2013-02-13 Thread Alain RODRIGUEZ
We probably need more info like the RF of your cluster and CL of your reads
and writes. Maybe could you also tell us if you use vnodes or not.

I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
C*1.2.

Alain


2013/2/13 Traian Fratean traian.frat...@gmail.com

 Hi,

 I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client
 with Astyanax 1.56.21.
 When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
 below) went down I get TokenRandeOfflineException and no other data gets
 inserted into *any other* node from the cluster.

 Am I having a configuration issue or this is supposed to happen?


 com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
 -
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
 latency=2057(2057), attempts=1]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
  at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
  at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
  at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)



 Thank you,
 Traian.