MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
Hi

As advised by jonathan  you need  to write a multithreaded client to insert
data quicker into cassandra. Now I've a problem regarding thrift TTransport.
As required I open a tSocket in each thread and create a cassandra client on
this socket.

the code is simple :
public void run() {
TSocket tSocket = null;
try {
tSocket = new TSocket(server, port);
TBinaryProtocol transport = new TBinaryProtocol(tSocket);

clientCassandra = new Client(transport);
tSocket.open();
 updateCassandra(clientCassandra);
catch (Exception e)
log something
finally
tSocket.close();
countDownLatch.getDown();
  }


*After some thousand of insert I get always :*

... 7 more
org.apache.thrift.transport.TTransportException:
java.net.NoRouteToHostException: Can't assign requested address
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at tv.bee.hiveplus.crud.CassandraThread.call(CassandraThread.java:297)
at tv.bee.hiveplus.crud.CassandraThread.call(CassandraThread.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: java.net.NoRouteToHostException: Can't assign requested address
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:525)
at java.net.Socket.connect(Socket.java:475)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 7 more

I've search and it seems that you get this when you've no more port to open
on the machine but I close the connection each time I open it. I've also
played with multithread to prevent a flooding thread I control the list of
submited task and wait unti these tasks finished here is my code

   CassandraThread cassandraThread = new CassandraThread(server, port,
operation, rowData, countDownLatch);
   listOfTask.add(cassandraThread);

if (listOfTask.size() == 1000) {
try {
System.out.println(Invoke);
executor.invokeAll(listOfTask);
 try {
countDownLatch.await();
if (countDownLatch.getCount() == 0) {
countDownLatch = new CountDownLatch(1000);
listOfTask.clear();
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

The number of connection is not high so why I get this error ? Thanks for
any help

Best Regards

Richard


Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
Ok,

After digging. The problem is relative to open socket the connection stay
for a while in TIME_WAIT status until closed really.
So If you overload the server of connection then you get this error after
you don't have free port.

To prevent this the standart java Socket implement the method
setReuseAddress. Unfortunately I think it's not implement in TSocket of
thift. Is there any other solution than using thrift ??

thanks




Re: MultiThread Client problem with thrift

2009-12-22 Thread Jaakko
Hi,

I don't know the particulars of java implementation, but if it works
the same way as Unix native socket API, then I would not recommend
setting linger to zero.

SO_LINGER option with zero value will cause TCP connection to be
aborted immediately as soon as the socket is closed. That is, (1)
remaining data in the send buffer will be discarded, (2) no proper
disconnect handshake and (3) receiving end will get TCP reset.

Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
is there to avoid packets from old connection being delivered to new
incarnation of the connection. Instead of avoiding the state, the
application should be changed so that TIME_WAIT will not be a problem.
How many open files you can see when the exception happens? Might be
that you're out of file descriptors.

-Jaakko


On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com wrote:
 Hi
 To all is interesting I've found a solution seems not recommended but
 working.
 When opening a Socket set this:
    tSocket.getSocket().setReuseAddress(true);
    tSocket.getSocket().setSoLinger(true, 0);
 it's prevent to have a lot of connection TIME_WAIT state but not
 recommended.



Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
I agree it's solve my problem but can give a bigger one.
The problem is I can't succeed to prevent opening a lot of connection

On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com wrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com
 wrote:
  Hi
  To all is interesting I've found a solution seems not recommended but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 



Re: MultiThread Client problem with thrift

2009-12-22 Thread Ran Tavory
Would connection pooling work for you?
This Java client http://code.google.com/p/cassandra-java-client/ has
connection pooling.
I haven't put the client under stress yet so I can't testify, but this may
be a good solution for you

On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection


 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com wrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com
 wrote:
  Hi
  To all is interesting I've found a solution seems not recommended but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 





Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
Yes of course but do you have updated to cassandra 0.5.0-beta2 ?

On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote:

 Would connection pooling work for you?
 This Java client http://code.google.com/p/cassandra-java-client/ has
 connection pooling.
 I haven't put the client under stress yet so I can't testify, but this may
 be a good solution for you


 On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection


 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com wrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com
 wrote:
  Hi
  To all is interesting I've found a solution seems not recommended but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 






TimeUUID Partitioning

2009-12-22 Thread Daniel Lundin
I'm pondering order preservation and TimeUUID keys, in particular how to
get distribution across the cluster while maintaining rangeability.

Basically, I'm working on a logging app, where rows are TimeUUIDs. To be
able to do range scans we're using OrderPreservingPartitioner.

To get partitioning working, I've currently transformed keys, prepending
a partitioning token (in my testcase, the day-of-week).
Basically, this means two range queries to get data for a set spanning
two days. Crude, but kinda works, and the specialization is alright for
my case. But it feels a bit hackish, so I begun studying the partitioner
code a bit, seeking enlightenment.

Has anybody already spent energy + time thinking about generic TimeUUID
partitioning? Seems like it could be a useful thing, since time series
data is quite common.

Perhaps a TimeUUIDPartitioner with configurable time resolution for
tokenization (token = uuid.time % resolution, more or less) would be
sufficient?

Or could it be even more general, i e no configuration necessary?

/d


Re: MultiThread Client problem with thrift

2009-12-22 Thread Ran Tavory
I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it,
but I haven't done so myself, I'm using 0.4.2

On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.comwrote:

 Yes of course but do you have updated to cassandra 0.5.0-beta2 ?


 On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote:

 Would connection pooling work for you?
 This Java client http://code.google.com/p/cassandra-java-client/ has
 connection pooling.
 I haven't put the client under stress yet so I can't testify, but this may
 be a good solution for you


 On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection


 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.comwrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com
 wrote:
  Hi
  To all is interesting I've found a solution seems not recommended but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 







Re: TimeUUID Partitioning

2009-12-22 Thread Richard Grossman
Same problem here.
But can't understand why to use TimeUUID instead just long. it make the same
job and much more simple


On Tue, Dec 22, 2009 at 2:48 PM, Daniel Lundin d...@eintr.org wrote:

 I'm pondering order preservation and TimeUUID keys, in particular how to
 get distribution across the cluster while maintaining rangeability.

 Basically, I'm working on a logging app, where rows are TimeUUIDs. To be
 able to do range scans we're using OrderPreservingPartitioner.

 To get partitioning working, I've currently transformed keys, prepending
 a partitioning token (in my testcase, the day-of-week).
 Basically, this means two range queries to get data for a set spanning
 two days. Crude, but kinda works, and the specialization is alright for
 my case. But it feels a bit hackish, so I begun studying the partitioner
 code a bit, seeking enlightenment.

 Has anybody already spent energy + time thinking about generic TimeUUID
 partitioning? Seems like it could be a useful thing, since time series
 data is quite common.

 Perhaps a TimeUUIDPartitioner with configurable time resolution for
 tokenization (token = uuid.time % resolution, more or less) would be
 sufficient?

 Or could it be even more general, i e no configuration necessary?

 /d



Re: MultiThread Client problem with thrift

2009-12-22 Thread Ran Tavory
Not at expert in this field, but I think what you want is use a connection
pool and NOT close the connections - reuse them. Only idle connections are
released after, say 1sec. Also, with a connection pool it's easy
to throttle the application, you can tell the pool to block if all 50
connections, or how many you define are allowed.

On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman richie...@gmail.comwrote:

 So I can't use it.

 But I've make my own connection pool. This are not fix nothing because the
 problem is lower than even java. In fact the socket is closed and java
 consider it as close but the system keep the Socket in the  state TIME_WAIT.
 Then the port used is actually still in use.

 So my question is that is there people that manage to open multiple
 connection and ride off the TIME_WAIT. No matter in which language PHP or
 Python etc...

 Thanks

 On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory ran...@gmail.com wrote:

 I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it,
 but I haven't done so myself, I'm using 0.4.2


 On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.comwrote:

 Yes of course but do you have updated to cassandra 0.5.0-beta2 ?


 On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote:

 Would connection pooling work for you?
 This Java client http://code.google.com/p/cassandra-java-client/ has
 connection pooling.
 I haven't put the client under stress yet so I can't testify, but this
 may be a good solution for you


 On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman 
 richie...@gmail.comwrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection


 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.comwrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman 
 richie...@gmail.com wrote:
  Hi
  To all is interesting I've found a solution seems not recommended
 but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 









Re: MultiThread Client problem with thrift

2009-12-22 Thread matthew hawthorne
On Tue, Dec 22, 2009 at 9:10 AM, Ran Tavory ran...@gmail.com wrote:
 Not at expert in this field, but I think what you want is use a connection
 pool and NOT close the connections - reuse them. Only idle connections are
 released after, say 1sec. Also, with a connection pool it's easy
 to throttle the application, you can tell the pool to block if all 50
 connections, or how many you define are allowed.

I did something very similar to this.  A difference in my approach is
that I did not release idle connections after a specific time period,
instead I performed a liveness check on each connection after
obtaining it from the pool, like this:

// get client connection from pool
Cassandra.Client client =

try {
  client.getInputProtocol().getTransport().flush();
} catch (TTransportException e) {
  // connection is invalid, obtain new connection
}

It seemed to work during my testing, not sure if the thrift specifics
are 100% correct (meaning I'm not sure if the catch block will work
for all situations involving stale or expired connections).

-matt


 On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman richie...@gmail.com
 wrote:

 So I can't use it.
 But I've make my own connection pool. This are not fix nothing because the
 problem is lower than even java. In fact the socket is closed and java
 consider it as close but the system keep the Socket in the  state TIME_WAIT.
 Then the port used is actually still in use.
 So my question is that is there people that manage to open multiple
 connection and ride off the TIME_WAIT. No matter in which language PHP or
 Python etc...
 Thanks
 On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory ran...@gmail.com wrote:

 I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it,
 but I haven't done so myself, I'm using 0.4.2

 On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.com
 wrote:

 Yes of course but do you have updated to cassandra 0.5.0-beta2 ?

 On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote:

 Would connection pooling work for you?
 This Java client http://code.google.com/p/cassandra-java-client/ has
 connection pooling.
 I haven't put the client under stress yet so I can't testify, but this
 may be a good solution for you

 On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.com
 wrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection

 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com
 wrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a
 problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman
 richie...@gmail.com wrote:
  Hi
  To all is interesting I've found a solution seems not recommended
  but
  working.
  When opening a Socket set this:
     tSocket.getSocket().setReuseAddress(true);
     tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 









Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
Ok I got this, of course this problem can be solved but lowered the load of
the server by whatever you want : connection pool or Thread management less
agressive. It's not my goal I would like to keep the server under high
pressure.

Finally I managed to find a solution by lowered the TIME_WAIT connection
status on my machine. need to make the adjustement to every machine. It's
system related on mac os x it's here :
http://www.brianp.net/2008/10/03/changing-the-length-of-the-time_wait-state-on-mac-os-x/

on linux it's easier to find.

Thanks

On Tue, Dec 22, 2009 at 4:46 PM, matthew hawthorne mhawtho...@gmail.comwrote:

 On Tue, Dec 22, 2009 at 9:10 AM, Ran Tavory ran...@gmail.com wrote:
  Not at expert in this field, but I think what you want is use a
 connection
  pool and NOT close the connections - reuse them. Only idle connections
 are
  released after, say 1sec. Also, with a connection pool it's easy
  to throttle the application, you can tell the pool to block if all 50
  connections, or how many you define are allowed.

 I did something very similar to this.  A difference in my approach is
 that I did not release idle connections after a specific time period,
 instead I performed a liveness check on each connection after
 obtaining it from the pool, like this:

 // get client connection from pool
 Cassandra.Client client =

 try {
  client.getInputProtocol().getTransport().flush();
 } catch (TTransportException e) {
  // connection is invalid, obtain new connection
 }

 It seemed to work during my testing, not sure if the thrift specifics
 are 100% correct (meaning I'm not sure if the catch block will work
 for all situations involving stale or expired connections).

 -matt


  On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman richie...@gmail.com
  wrote:
 
  So I can't use it.
  But I've make my own connection pool. This are not fix nothing because
 the
  problem is lower than even java. In fact the socket is closed and java
  consider it as close but the system keep the Socket in the  state
 TIME_WAIT.
  Then the port used is actually still in use.
  So my question is that is there people that manage to open multiple
  connection and ride off the TIME_WAIT. No matter in which language PHP
 or
  Python etc...
  Thanks
  On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory ran...@gmail.com wrote:
 
  I don't have a 0.5.0-beta2 version, no. It's not too difficult to add
 it,
  but I haven't done so myself, I'm using 0.4.2
 
  On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.com
 
  wrote:
 
  Yes of course but do you have updated to cassandra 0.5.0-beta2 ?
 
  On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote:
 
  Would connection pooling work for you?
  This Java client http://code.google.com/p/cassandra-java-client/ has
  connection pooling.
  I haven't put the client under stress yet so I can't testify, but
 this
  may be a good solution for you
 
  On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman 
 richie...@gmail.com
  wrote:
 
  I agree it's solve my problem but can give a bigger one.
  The problem is I can't succeed to prevent opening a lot of
 connection
 
  On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com
  wrote:
 
  Hi,
 
  I don't know the particulars of java implementation, but if it
 works
  the same way as Unix native socket API, then I would not recommend
  setting linger to zero.
 
  SO_LINGER option with zero value will cause TCP connection to be
  aborted immediately as soon as the socket is closed. That is, (1)
  remaining data in the send buffer will be discarded, (2) no proper
  disconnect handshake and (3) receiving end will get TCP reset.
 
  Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend
 and
  is there to avoid packets from old connection being delivered to
 new
  incarnation of the connection. Instead of avoiding the state, the
  application should be changed so that TIME_WAIT will not be a
  problem.
  How many open files you can see when the exception happens? Might
 be
  that you're out of file descriptors.
 
  -Jaakko
 
 
  On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman
  richie...@gmail.com wrote:
   Hi
   To all is interesting I've found a solution seems not recommended
   but
   working.
   When opening a Socket set this:
  tSocket.getSocket().setReuseAddress(true);
  tSocket.getSocket().setSoLinger(true, 0);
   it's prevent to have a lot of connection TIME_WAIT state but not
   recommended.
  
 
 
 
 
 
 
 



Potential problem with 0.5 branch (Possibly in gossiping?)

2009-12-22 Thread Ramzi Rabah
I just recently upgraded to latest in 0.5 branch, and I am running
into a serious issue. I have a cluster with 4 nodes, rackunaware
strategy, and using my own tokens distributed evenly over the hash
space. I am writing/reading equally to them at an equal rate of about
230 reads/writes per second(and cfstats shows that). The first 3 nodes
are seeds, the last one isn't. When I start all the nodes together at
the same time, they all receive equal amounts of reads/writes (about
230).
When I bring node 4 down and bring it back up again, node 4's load
fluctuates between the 230 it used to get to sometimes no traffic at
all. The other 3 still have the same amount of traffic. And no errors
what so ever seen in logs. Any ideas what can be causing this
fluctuation on node 4 after I restarted it?


Re: BUILD FAILURE

2009-12-22 Thread Eric Evans
On Mon, 2009-12-21 at 17:50 -0800, Adam Fisk wrote:
 Any reason you guys do it this way? It's *much* easier to maintain a
 working ant build using maven as the base than the other way around
 (mvn ant:ant). 

I can't speak for anyone but myself ... but:

1. I don't care enough for the special features (the stuff above and
beyond what ant is providing), to the invest the effort in learning it.
2. Maven triggers my gag reflex.


-- 
Eric Evans
eev...@rackspace.com



Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
The problem is not on the server side but on the client side.
The connection are not open they are closed but in status TIME_WAIT so they
use a port

On Tue, Dec 22, 2009 at 7:50 PM, Ran Tavory ran...@gmail.com wrote:

 I don't know how keeping the connections open affects at scale. I suppose
 if you have 10 to 1 ratio of cassandra clients to cassandra server (probably
 a typical ratio) then you may be using too much server resources




RE: MultiThread Client problem with thrift

2009-12-22 Thread Brian Burruss
i don't close the connection to a server unless i get exceptions.  and when i 
close the connection i try a new server in the cluster just to keep the 
connections spread across the cluster.

should i be closing them?  if the connection is closed by client or server i'll 
just reconnect.


From: Ran Tavory [ran...@gmail.com]
Sent: Tuesday, December 22, 2009 9:50 AM
To: cassandra-user@incubator.apache.org
Subject: Re: MultiThread Client problem with thrift

I don't know how keeping the connections open affects at scale. I suppose if 
you have 10 to 1 ratio of cassandra clients to cassandra server (probably a 
typical ratio) then you may be using too much server resources

On Tue, Dec 22, 2009 at 4:46 PM, matthew hawthorne 
mhawtho...@gmail.commailto:mhawtho...@gmail.com wrote:
On Tue, Dec 22, 2009 at 9:10 AM, Ran Tavory 
ran...@gmail.commailto:ran...@gmail.com wrote:
 Not at expert in this field, but I think what you want is use a connection
 pool and NOT close the connections - reuse them. Only idle connections are
 released after, say 1sec. Also, with a connection pool it's easy
 to throttle the application, you can tell the pool to block if all 50
 connections, or how many you define are allowed.

I did something very similar to this.  A difference in my approach is
that I did not release idle connections after a specific time period,
instead I performed a liveness check on each connection after
obtaining it from the pool, like this:

// get client connection from pool
Cassandra.Client client =

try {
 client.getInputProtocol().getTransport().flush();
} catch (TTransportException e) {
 // connection is invalid, obtain new connection
}

It seemed to work during my testing, not sure if the thrift specifics
are 100% correct (meaning I'm not sure if the catch block will work
for all situations involving stale or expired connections).

-matt


 On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman 
 richie...@gmail.commailto:richie...@gmail.com
 wrote:

 So I can't use it.
 But I've make my own connection pool. This are not fix nothing because the
 problem is lower than even java. In fact the socket is closed and java
 consider it as close but the system keep the Socket in the  state TIME_WAIT.
 Then the port used is actually still in use.
 So my question is that is there people that manage to open multiple
 connection and ride off the TIME_WAIT. No matter in which language PHP or
 Python etc...
 Thanks
 On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory 
 ran...@gmail.commailto:ran...@gmail.com wrote:

 I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it,
 but I haven't done so myself, I'm using 0.4.2

 On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman 
 richie...@gmail.commailto:richie...@gmail.com
 wrote:

 Yes of course but do you have updated to cassandra 0.5.0-beta2 ?

 On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory 
 ran...@gmail.commailto:ran...@gmail.com wrote:

 Would connection pooling work for you?
 This Java client http://code.google.com/p/cassandra-java-client/ has
 connection pooling.
 I haven't put the client under stress yet so I can't testify, but this
 may be a good solution for you

 On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman 
 richie...@gmail.commailto:richie...@gmail.com
 wrote:

 I agree it's solve my problem but can give a bigger one.
 The problem is I can't succeed to prevent opening a lot of connection

 On Tue, Dec 22, 2009 at 1:51 PM, Jaakko 
 rosvopaalli...@gmail.commailto:rosvopaalli...@gmail.com
 wrote:

 Hi,

 I don't know the particulars of java implementation, but if it works
 the same way as Unix native socket API, then I would not recommend
 setting linger to zero.

 SO_LINGER option with zero value will cause TCP connection to be
 aborted immediately as soon as the socket is closed. That is, (1)
 remaining data in the send buffer will be discarded, (2) no proper
 disconnect handshake and (3) receiving end will get TCP reset.

 Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and
 is there to avoid packets from old connection being delivered to new
 incarnation of the connection. Instead of avoiding the state, the
 application should be changed so that TIME_WAIT will not be a
 problem.
 How many open files you can see when the exception happens? Might be
 that you're out of file descriptors.

 -Jaakko


 On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman
 richie...@gmail.commailto:richie...@gmail.com wrote:
  Hi
  To all is interesting I've found a solution seems not recommended
  but
  working.
  When opening a Socket set this:
 tSocket.getSocket().setReuseAddress(true);
 tSocket.getSocket().setSoLinger(true, 0);
  it's prevent to have a lot of connection TIME_WAIT state but not
  recommended.
 










Re: TimeUUID Partitioning

2009-12-22 Thread Richard Grossman
Don't understand your last statement (though so does date-based strings and
OrderPreservingPartioner).
What do you mean that using TimeUUID provide a better distribution when
using OrderPreservingPartioner.
Or there is no effect if using OrderPreservingPartioner

Thanks

On Tue, Dec 22, 2009 at 6:19 PM, Eric Evans eev...@rackspace.com wrote:

 On Tue, 2009-12-22 at 13:48 +0100, Daniel Lundin wrote:
  Has anybody already spent energy + time thinking about generic
  TimeUUID partitioning? Seems like it could be a useful thing, since
  time series data is quite common.

 I think it's possible, but it's going to provide very poor distribution
 properties, (though so does date-based strings and
 OrderPreservingPartioner).

 --
 Eric Evans
 eev...@rackspace.com




Re: MultiThread Client problem with thrift

2009-12-22 Thread Richard Grossman
When I try to reuse socket to make multiple thrift operation in multithread
I get always exception

On Tue, Dec 22, 2009 at 9:28 PM, Brian Burruss bburr...@real.com wrote:

 i don't close the connection to a server unless i get exceptions.  and when
 i close the connection i try a new server in the cluster just to keep the
 connections spread across the cluster.

 should i be closing them?  if the connection is closed by client or server
 i'll just reconnect.

 
 From: Ran Tavory [ran...@gmail.com]
 Sent: Tuesday, December 22, 2009 9:50 AM
 To: cassandra-user@incubator.apache.org
 Subject: Re: MultiThread Client problem with thrift

 I don't know how keeping the connections open affects at scale. I suppose
 if you have 10 to 1 ratio of cassandra clients to cassandra server (probably
 a typical ratio) then you may be using too much server resources

 On Tue, Dec 22, 2009 at 4:46 PM, matthew hawthorne mhawtho...@gmail.com
 mailto:mhawtho...@gmail.com wrote:
 On Tue, Dec 22, 2009 at 9:10 AM, Ran Tavory ran...@gmail.commailto:
 ran...@gmail.com wrote:
  Not at expert in this field, but I think what you want is use a
 connection
  pool and NOT close the connections - reuse them. Only idle connections
 are
  released after, say 1sec. Also, with a connection pool it's easy
  to throttle the application, you can tell the pool to block if all 50
  connections, or how many you define are allowed.

 I did something very similar to this.  A difference in my approach is
 that I did not release idle connections after a specific time period,
 instead I performed a liveness check on each connection after
 obtaining it from the pool, like this:

 // get client connection from pool
 Cassandra.Client client =

 try {
  client.getInputProtocol().getTransport().flush();
 } catch (TTransportException e) {
  // connection is invalid, obtain new connection
 }

 It seemed to work during my testing, not sure if the thrift specifics
 are 100% correct (meaning I'm not sure if the catch block will work
 for all situations involving stale or expired connections).

 -matt


  On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman richie...@gmail.com
 mailto:richie...@gmail.com
  wrote:
 
  So I can't use it.
  But I've make my own connection pool. This are not fix nothing because
 the
  problem is lower than even java. In fact the socket is closed and java
  consider it as close but the system keep the Socket in the  state
 TIME_WAIT.
  Then the port used is actually still in use.
  So my question is that is there people that manage to open multiple
  connection and ride off the TIME_WAIT. No matter in which language PHP
 or
  Python etc...
  Thanks
  On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory ran...@gmail.commailto:
 ran...@gmail.com wrote:
 
  I don't have a 0.5.0-beta2 version, no. It's not too difficult to add
 it,
  but I haven't done so myself, I'm using 0.4.2
 
  On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.com
 mailto:richie...@gmail.com
  wrote:
 
  Yes of course but do you have updated to cassandra 0.5.0-beta2 ?
 
  On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.commailto:
 ran...@gmail.com wrote:
 
  Would connection pooling work for you?
  This Java client http://code.google.com/p/cassandra-java-client/ has
  connection pooling.
  I haven't put the client under stress yet so I can't testify, but
 this
  may be a good solution for you
 
  On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman 
 richie...@gmail.commailto:richie...@gmail.com
  wrote:
 
  I agree it's solve my problem but can give a bigger one.
  The problem is I can't succeed to prevent opening a lot of
 connection
 
  On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com
 mailto:rosvopaalli...@gmail.com
  wrote:
 
  Hi,
 
  I don't know the particulars of java implementation, but if it
 works
  the same way as Unix native socket API, then I would not recommend
  setting linger to zero.
 
  SO_LINGER option with zero value will cause TCP connection to be
  aborted immediately as soon as the socket is closed. That is, (1)
  remaining data in the send buffer will be discarded, (2) no proper
  disconnect handshake and (3) receiving end will get TCP reset.
 
  Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend
 and
  is there to avoid packets from old connection being delivered to
 new
  incarnation of the connection. Instead of avoiding the state, the
  application should be changed so that TIME_WAIT will not be a
  problem.
  How many open files you can see when the exception happens? Might
 be
  that you're out of file descriptors.
 
  -Jaakko
 
 
  On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman
  richie...@gmail.commailto:richie...@gmail.com wrote:
   Hi
   To all is interesting I've found a solution seems not recommended
   but
   working.
   When opening a Socket set this:
  tSocket.getSocket().setReuseAddress(true);
  tSocket.getSocket().setSoLinger(true, 0);
 

Re: MultiThread Client problem with thrift

2009-12-22 Thread Jonathan Ellis
On Tue, Dec 22, 2009 at 1:28 PM, Brian Burruss bburr...@real.com wrote:
 i don't close the connection to a server unless i get exceptions.  and when i 
 close the connection i try a new server in the cluster just to keep the 
 connections spread across the cluster.

right, that is the sane way to do it rather than imposing a thrift
connection overhead on each operation.

-Jonathan


Re: TimeUUID Partitioning

2009-12-22 Thread Richard Grossman
Ok you absolutly right and I look to prevent this.

So my question might be stupid but why the timeUUID will distribute better ?


On Tue, Dec 22, 2009 at 9:55 PM, Eric Evans eev...@rackspace.com wrote:

 On Tue, 2009-12-22 at 21:32 +0200, Richard Grossman wrote:
  Don't understand your last statement (though so does date-based
  strings and OrderPreservingPartioner).
  What do you mean that using TimeUUID provide a better distribution
  when using OrderPreservingPartioner. Or there is no effect if using
  OrderPreservingPartioner

 I mean that if you are using a namespace that is based on some date/time
 range, and you partition that up into smaller date/time ranges, and then
 store time series data, your data will be written to the node that
 corresponds to the date/time, and not to any of the others.

 Imagine that you have 5 nodes and they are partitioned:

 A - 2009
 B - 2010
 C - 2011
 D - 2012
 E - 2013

 Any writes occurring between now and Jan 1, 2010 will go to node A, at
 which point all writes will go to node B for the 365 days that follow,
 and so on.

 --
 Eric Evans
 eev...@rackspace.com




Re: TimeUUID Partitioning

2009-12-22 Thread Eric Evans
On Tue, 2009-12-22 at 23:31 +0200, Richard Grossman wrote:
 So my question might be stupid but why the timeUUID will distribute
 better ?

It won't, it's the same problem either way.

-- 
Eric Evans
eev...@rackspace.com



Re: Potential problem with 0.5 branch (Possibly in gossiping?)

2009-12-22 Thread Jaakko
Hi,

Which revision number you are running?

Can you see any log lines related to node being UP or dead? (like
InetAddress X.X.X.X is now dead or Node X.X.X.X has restarted, now
UP again). These messages come from the Gossiper and indicate if it
for some reason thinks the node is dead. Level of these messages is
info.

Another thing is: can you see any log messages like Node X.X.X.X
state normal, token XXX? These are on debug level.

-Jaakko


On Wed, Dec 23, 2009 at 12:59 AM, Ramzi Rabah rra...@playdom.com wrote:
 I just recently upgraded to latest in 0.5 branch, and I am running
 into a serious issue. I have a cluster with 4 nodes, rackunaware
 strategy, and using my own tokens distributed evenly over the hash
 space. I am writing/reading equally to them at an equal rate of about
 230 reads/writes per second(and cfstats shows that). The first 3 nodes
 are seeds, the last one isn't. When I start all the nodes together at
 the same time, they all receive equal amounts of reads/writes (about
 230).
 When I bring node 4 down and bring it back up again, node 4's load
 fluctuates between the 230 it used to get to sometimes no traffic at
 all. The other 3 still have the same amount of traffic. And no errors
 what so ever seen in logs. Any ideas what can be causing this
 fluctuation on node 4 after I restarted it?



Re: Get_count method error?

2009-12-22 Thread Adam Fisk
Just wanted to add I'm seeing the same thing on trunk running
everything with a consistency level of one.

I'm running -r893382 (a week back or so I believe).

All the Best,

-Adam


On Tue, Dec 15, 2009 at 7:25 PM, Brandon Williams dri...@gmail.com wrote:
 On Tue, Dec 15, 2009 at 9:19 PM, JKnight JKnight beukni...@gmail.com
 wrote:

 Dear Mr Jonathan,

 I have tested on version 0.4.2 and 0.5.0.
 The attachment contains my test case.

 You are inserting and performing get_count with a consistency level of one,
 but deleting with a consistency level of zero.  Can you try deleting with a
 consistency level of one also?
 -Brandon



-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk


Re: Potential problem with 0.5 branch (Possibly in gossiping?)

2009-12-22 Thread Ramzi Rabah
Hi Jaako thanks for your response.

I compiled the very latest from 0.5 branch yesterday (whatever
yesterday nights build was). I do see that Node X.X.X.X is dead, and
Node X.X.X.X has restarted.

This show up on all the 3 other servers:
 INFO [Timer-1] 2009-12-22 20:38:43,738 Gossiper.java (line 194)
InetAddress /10.6.168.20 is now dead.

Node /10.6.168.20 has restarted, now UP again
 INFO [GMFD:1] 2009-12-22 20:43:12,812 StorageService.java (line 475)
Node /10.6.168.20 state jump to normal

This time the first time I restarted the node it seemed fine, but the
second time I restarted it, this is what cfstats is showing for
traffic on it :

Column Family: Datastore
Memtable Columns Count: 407
Memtable Data Size: 42268
Memtable Switch Count: 1
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0

and then it went up and now it's back to:

  Column Family: Datastore
Memtable Columns Count: 2331
Memtable Data Size: 242364
Memtable Switch Count: 1
Read Count: 107
Read Latency: 0.486 ms.
Write Count: 113
Write Latency: 0.000 ms.
Pending Tasks: 0

which is half the traffic the other nodes are showing. The other 3
nodes are showing a consistent ~230 reads/writes per second, which
node 4 was showing before it was restarted. I hope data is not being
lost in the process?


On Tue, Dec 22, 2009 at 4:43 PM, Jaakko rosvopaalli...@gmail.com wrote:
 Hi,

 Which revision number you are running?

 Can you see any log lines related to node being UP or dead? (like
 InetAddress X.X.X.X is now dead or Node X.X.X.X has restarted, now
 UP again). These messages come from the Gossiper and indicate if it
 for some reason thinks the node is dead. Level of these messages is
 info.

 Another thing is: can you see any log messages like Node X.X.X.X
 state normal, token XXX? These are on debug level.

 -Jaakko


 On Wed, Dec 23, 2009 at 12:59 AM, Ramzi Rabah rra...@playdom.com wrote:
 I just recently upgraded to latest in 0.5 branch, and I am running
 into a serious issue. I have a cluster with 4 nodes, rackunaware
 strategy, and using my own tokens distributed evenly over the hash
 space. I am writing/reading equally to them at an equal rate of about
 230 reads/writes per second(and cfstats shows that). The first 3 nodes
 are seeds, the last one isn't. When I start all the nodes together at
 the same time, they all receive equal amounts of reads/writes (about
 230).
 When I bring node 4 down and bring it back up again, node 4's load
 fluctuates between the 230 it used to get to sometimes no traffic at
 all. The other 3 still have the same amount of traffic. And no errors
 what so ever seen in logs. Any ideas what can be causing this
 fluctuation on node 4 after I restarted it?




Re: Get_count method error?

2009-12-22 Thread Jonathan Ellis
As I said, a test case against the example CF definitions
illustrating the problem would help us fix the problem.

-Jonathan

On Tue, Dec 22, 2009 at 8:56 PM, Adam Fisk a...@littleshoot.org wrote:
 Just wanted to add I'm seeing the same thing on trunk running
 everything with a consistency level of one.

 I'm running -r893382 (a week back or so I believe).


Why seed can't startup in Bootstrap mode ?

2009-12-22 Thread zhangyf2007
I found that if one node has been set as seed it can't startup in
Bootstarp mode even though AutoBootstrap is true. Why the seed node
can't startup in Bootstrap mode ?

Thanks!


Re: Why seed can't startup in Bootstrap mode ?

2009-12-22 Thread Jonathan Ellis
Seeds are supposed to be always part of the cluster by design, so
adding one in bootstrap mode makes no sense.

If you need to add seeds to your cluster, designate existing nodes as such.

2009/12/22 zhangyf2007 zhangyf2...@gmail.com:
 I found that if one node has been set as seed it can't startup in
 Bootstarp mode even though AutoBootstrap is true. Why the seed node
 can't startup in Bootstrap mode ?

 Thanks!



Re: Potential problem with 0.5 branch (Possibly in gossiping?)

2009-12-22 Thread Ramzi Rabah
Watching it for a little longer, it went up again to 230 where it
settled for about a few minutes, and now it dropped back to 0. Very
strange.

On Tue, Dec 22, 2009 at 7:01 PM, Ramzi Rabah rra...@playdom.com wrote:
 Hi Jaako thanks for your response.

 I compiled the very latest from 0.5 branch yesterday (whatever
 yesterday nights build was). I do see that Node X.X.X.X is dead, and
 Node X.X.X.X has restarted.

 This show up on all the 3 other servers:
  INFO [Timer-1] 2009-12-22 20:38:43,738 Gossiper.java (line 194)
 InetAddress /10.6.168.20 is now dead.

 Node /10.6.168.20 has restarted, now UP again
  INFO [GMFD:1] 2009-12-22 20:43:12,812 StorageService.java (line 475)
 Node /10.6.168.20 state jump to normal

 This time the first time I restarted the node it seemed fine, but the
 second time I restarted it, this is what cfstats is showing for
 traffic on it :

                Column Family: Datastore
                Memtable Columns Count: 407
                Memtable Data Size: 42268
                Memtable Switch Count: 1
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0

 and then it went up and now it's back to:

          Column Family: Datastore
                Memtable Columns Count: 2331
                Memtable Data Size: 242364
                Memtable Switch Count: 1
                Read Count: 107
                Read Latency: 0.486 ms.
                Write Count: 113
                Write Latency: 0.000 ms.
                Pending Tasks: 0

 which is half the traffic the other nodes are showing. The other 3
 nodes are showing a consistent ~230 reads/writes per second, which
 node 4 was showing before it was restarted. I hope data is not being
 lost in the process?


 On Tue, Dec 22, 2009 at 4:43 PM, Jaakko rosvopaalli...@gmail.com wrote:
 Hi,

 Which revision number you are running?

 Can you see any log lines related to node being UP or dead? (like
 InetAddress X.X.X.X is now dead or Node X.X.X.X has restarted, now
 UP again). These messages come from the Gossiper and indicate if it
 for some reason thinks the node is dead. Level of these messages is
 info.

 Another thing is: can you see any log messages like Node X.X.X.X
 state normal, token XXX? These are on debug level.

 -Jaakko


 On Wed, Dec 23, 2009 at 12:59 AM, Ramzi Rabah rra...@playdom.com wrote:
 I just recently upgraded to latest in 0.5 branch, and I am running
 into a serious issue. I have a cluster with 4 nodes, rackunaware
 strategy, and using my own tokens distributed evenly over the hash
 space. I am writing/reading equally to them at an equal rate of about
 230 reads/writes per second(and cfstats shows that). The first 3 nodes
 are seeds, the last one isn't. When I start all the nodes together at
 the same time, they all receive equal amounts of reads/writes (about
 230).
 When I bring node 4 down and bring it back up again, node 4's load
 fluctuates between the 230 it used to get to sometimes no traffic at
 all. The other 3 still have the same amount of traffic. And no errors
 what so ever seen in logs. Any ideas what can be causing this
 fluctuation on node 4 after I restarted it?





Re: Potential problem with 0.5 branch (Possibly in gossiping?)

2009-12-22 Thread Jaakko
OK, just to make sure: you can see these gossip/state messages when
the node is going down and coming back up again, but not afterwards?
That is, after you restart the node, you see 10.6.168.20 UP and
state jump to normal only once and when the write rate goes to zero
and/or comes back to 230?


On Wed, Dec 23, 2009 at 12:01 PM, Ramzi Rabah rra...@playdom.com wrote:
 Hi Jaako thanks for your response.

 I compiled the very latest from 0.5 branch yesterday (whatever
 yesterday nights build was). I do see that Node X.X.X.X is dead, and
 Node X.X.X.X has restarted.

 This show up on all the 3 other servers:
  INFO [Timer-1] 2009-12-22 20:38:43,738 Gossiper.java (line 194)
 InetAddress /10.6.168.20 is now dead.

 Node /10.6.168.20 has restarted, now UP again
  INFO [GMFD:1] 2009-12-22 20:43:12,812 StorageService.java (line 475)
 Node /10.6.168.20 state jump to normal

 This time the first time I restarted the node it seemed fine, but the
 second time I restarted it, this is what cfstats is showing for
 traffic on it :

                Column Family: Datastore
                Memtable Columns Count: 407
                Memtable Data Size: 42268
                Memtable Switch Count: 1
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0

 and then it went up and now it's back to:

          Column Family: Datastore
                Memtable Columns Count: 2331
                Memtable Data Size: 242364
                Memtable Switch Count: 1
                Read Count: 107
                Read Latency: 0.486 ms.
                Write Count: 113
                Write Latency: 0.000 ms.
                Pending Tasks: 0

 which is half the traffic the other nodes are showing. The other 3
 nodes are showing a consistent ~230 reads/writes per second, which
 node 4 was showing before it was restarted. I hope data is not being
 lost in the process?


 On Tue, Dec 22, 2009 at 4:43 PM, Jaakko rosvopaalli...@gmail.com wrote:
 Hi,

 Which revision number you are running?

 Can you see any log lines related to node being UP or dead? (like
 InetAddress X.X.X.X is now dead or Node X.X.X.X has restarted, now
 UP again). These messages come from the Gossiper and indicate if it
 for some reason thinks the node is dead. Level of these messages is
 info.

 Another thing is: can you see any log messages like Node X.X.X.X
 state normal, token XXX? These are on debug level.

 -Jaakko


 On Wed, Dec 23, 2009 at 12:59 AM, Ramzi Rabah rra...@playdom.com wrote:
 I just recently upgraded to latest in 0.5 branch, and I am running
 into a serious issue. I have a cluster with 4 nodes, rackunaware
 strategy, and using my own tokens distributed evenly over the hash
 space. I am writing/reading equally to them at an equal rate of about
 230 reads/writes per second(and cfstats shows that). The first 3 nodes
 are seeds, the last one isn't. When I start all the nodes together at
 the same time, they all receive equal amounts of reads/writes (about
 230).
 When I bring node 4 down and bring it back up again, node 4's load
 fluctuates between the 230 it used to get to sometimes no traffic at
 all. The other 3 still have the same amount of traffic. And no errors
 what so ever seen in logs. Any ideas what can be causing this
 fluctuation on node 4 after I restarted it?





Re: How know node is fully up?

2009-12-22 Thread Brian Burruss
I never heard from anyone about this.  I think it is important for bringing 
nodes out of service during upgrades so no data loss occurs.  Also when 
introducing a new node you need to know when it is fully populated.

Tux!

Brian Burruss bburr...@real.com wrote:


How can i tell that a node is completely up and taking reads and writes?

- at startup?
- after new bootstrap?
- after a node has been unavailable for some time and rejoins the cluster?

i see the INFO [main] [CassandraDaemon.java:141] Cassandra starting up... 
message in the log, but it seems to have happened way too fast after i 
simulated a crash.

using tpstats i don't see any ROW-READ-STAGE completed, but lots of 
ROW-MUTATION-STAGE completed which seems to be correct for a node that is still 
sync'ing with the cluster after being unavailable.

.. but how do i know ;)

thx!


Re: How know node is fully up?

2009-12-22 Thread Jonathan Ellis
It's up when it logs Cassandra starting up... and starts listening
for thrift connections

On Tue, Dec 22, 2009 at 10:16 PM, Brian Burruss bburr...@real.com wrote:
 I never heard from anyone about this.  I think it is important for bringing 
 nodes out of service during upgrades so no data loss occurs.  Also when 
 introducing a new node you need to know when it is fully populated.

 Tux!

 Brian Burruss bburr...@real.com wrote:


 How can i tell that a node is completely up and taking reads and writes?

 - at startup?
 - after new bootstrap?
 - after a node has been unavailable for some time and rejoins the cluster?

 i see the INFO [main] [CassandraDaemon.java:141] Cassandra starting up... 
 message in the log, but it seems to have happened way too fast after i 
 simulated a crash.

 using tpstats i don't see any ROW-READ-STAGE completed, but lots of 
 ROW-MUTATION-STAGE completed which seems to be correct for a node that is 
 still sync'ing with the cluster after being unavailable.

 .. but how do i know ;)

 thx!



Re: Can not insert to one key?

2009-12-22 Thread Jonathan Ellis
Sounds like you are not inserting new data with a high enough
timestamp to override a deletion done earlier.

On Tue, Dec 22, 2009 at 11:20 PM, JKnight JKnight beukni...@gmail.com wrote:
 Dear all,

 I have problem with one key. I can not insert data to this key.
 When I check code, all column inside this key have field markForDelete =
 true. I try to insert data to those columns, and I check field markForDelete
 of all column inside this key, it still = true???

 The attachment is the data.
 Here is my keyspace.
     Keyspace Name=Fan
           ColumnFamily CompareWith=BytesType Name=Fan/
         ColumnFamily CompareWith=BytesType Name=VipOfUser/
     /Keyspace
 And the key error is 487579 in ColumnFamily Fan.


 Could you help me?
 Thank a lot for support.
 --
 Best regards,
 JKnight