Maybe your <StoragePort>7000</StoragePort> is being blocked by iptables or some firewall or maybe you have it bound (<ListenAddress> tag ) to localhost instead an ip address.
Hope this helps, Dimitry. On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz < thibaut.br...@trendiction.com> wrote: > Hi, > > I have the same problem with 0.6.5 > > New nodes will hang forever in bootstrap mode (no streams are being opened) > and the receiver thread just waits for data forever: > > > INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120) > Sampling index for /hd2/cassandra/data/table_xyz/ > table_xyz-3-Data.db > INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java > (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db > > Stacktracke: > > "pool-1-thread-53" prio=10 tid=0x00000000412f2800 nid=0x215c runnable > [0x00007fd7cf217000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > - locked <0x00007fd7e77e0520> (a java.io.BufferedInputStream) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > > > > > > > > > > > On Thu, Oct 28, 2010 at 12:44 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> The best approach is to manually select the tokens, see the Load Balancing >> section http://wiki.apache.org/cassandra/Operations Also >> >> Are there any log messages in the existing nodes or the new one which >> mention each other? >> >> Is this a production system? Is it still running ? >> >> Sorry there is not a lot to go on, it sounds like you've done the right >> thing. I'm assuming things like the Cluster Name, seed list and port numbers >> are set correct as the new node got some data. >> >> You'll need to dig through the logs a bit more to see that the boot >> strapping started and what was the last message it logged. >> >> Good Luck. >> Aaron >> >> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote: >> >> Hi Aaron, >> Thanks for your reply. >> >> We still haven't solved this unfortunately. >> >> How did you start the bootstrap for the .18 node ? >> >> >> Standard way: we set "AutoBootstrap" to true and added all the servers >> from the working ring as seeds. >> >> >>> Was it the .18 or the .17 node you tried to add >> >> >> We first tried adding .17, it streamed for a while, took on a 50GB of >> load, stopped streaming but then didn't enter into the ring. We left it for >> a few days to see if it would come in, but no luck. After that we did >> decommission and removeToken ( in that order) operations. >> Since we couldn't get .17 in we tried again with .18. Before doing so we >> increased the RpcTimeoutInMillis from 1000, to 10000 having read that this >> may cause the problem of nodes not entering into the ring. It's been going >> since friday and still, like .17, won't come into the ring. >> >> Does it have a token in the config or did you use nodetool move to set it >> >> No we didn't manually set the token in the config, rather we were relaying >> on the token to be assigned durring bootstrap from the RandomPartitioner. >> >> Again thanks for the help. >> >> Dimitry. >> >> >> >> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton >> <aa...@thelastpickle.com>wrote: >> >>> Dimitry, Did you get anywhere with this ? >>> >>> Was it the .18 or the .17 node you tried to add ? How did you start the >>> bootstrap for the .18 node ? Does it have a token in the config or did you >>> use nodetool move to set it? >>> >>> I had a quick look at the code AKAIK the message about removing the fat >>> client is logged when the node does not have a record of the token the other >>> node as. >>> >>> Aaron >>> >>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimi...@reviewpro.com> >>> wrote: >>> >>> Hi All, >>> We recently upgraded from .65 to .66 after which we tried adding a new >>> node to our cluster. We left it bootstrapping and after 3 days, it still >>> refused to join the ring. The strange thing is that nodetool info shows 50GB >>> of load and nodetool ring shows that it sees the rest of ring, which it is >>> not part of. We tried the process again with another server -- again the >>> same thing as before: >>> >>> >>> //from machine 192.168.218 >>> >>> >>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info >>> 131373516047318302934572185119435768941 >>> Load : 52.85 GB >>> Generation No : 1287761987 >>> Uptime (seconds) : 323157 >>> Heap Memory (MB) : 795.42 / 1945.63 >>> >>> >>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring >>> Address Status Load Range Ring >>> 158573510920250391466717289405976537674 >>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--| >>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | | >>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | | >>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->| >>> >>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams >>> Mode: Bootstrapping >>> Not sending any streams. >>> Not receiving any streams. >>> >>> >>> Whats more, while looking at the log of one of the nodes I see gossip >>> messages from 192.168.1.17 -- the first node we tried to add to the cluster >>> but which is not running at the the time of the log message: >>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) FatClient >>> /192.168.2.17 has been silent for 3600000ms, removing from gossip >>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node / >>> 192.168.2.17 is now part of the cluster >>> >>> >>> Thanks in advance for the help, >>> Dimitry >>> >>> >> >> >> -- >> Dimitry Lvovsky >> Director of Engineering >> ReviewPro >> www.reviewpro.com >> +34 616 337 103 >> >> >> > -- Dimitry Lvovsky Director of Engineering ReviewPro www.reviewpro.com +34 616 337 103