Watching it for a little longer, it went up again to 230 where it settled for about a few minutes, and now it dropped back to 0. Very strange.
On Tue, Dec 22, 2009 at 7:01 PM, Ramzi Rabah <rra...@playdom.com> wrote: > Hi Jaako thanks for your response. > > I compiled the very latest from 0.5 branch yesterday (whatever > yesterday nights build was). I do see that Node X.X.X.X is dead, and > Node X.X.X.X has restarted. > > This show up on all the 3 other servers: > INFO [Timer-1] 2009-12-22 20:38:43,738 Gossiper.java (line 194) > InetAddress /10.6.168.20 is now dead. > > Node /10.6.168.20 has restarted, now UP again > INFO [GMFD:1] 2009-12-22 20:43:12,812 StorageService.java (line 475) > Node /10.6.168.20 state jump to normal > > This time the first time I restarted the node it seemed fine, but the > second time I restarted it, this is what cfstats is showing for > traffic on it : > > Column Family: Datastore > Memtable Columns Count: 407 > Memtable Data Size: 42268 > Memtable Switch Count: 1 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > > and then it went up and now it's back to: > > Column Family: Datastore > Memtable Columns Count: 2331 > Memtable Data Size: 242364 > Memtable Switch Count: 1 > Read Count: 107 > Read Latency: 0.486 ms. > Write Count: 113 > Write Latency: 0.000 ms. > Pending Tasks: 0 > > which is half the traffic the other nodes are showing. The other 3 > nodes are showing a consistent ~230 reads/writes per second, which > node 4 was showing before it was restarted. I hope data is not being > lost in the process? > > > On Tue, Dec 22, 2009 at 4:43 PM, Jaakko <rosvopaalli...@gmail.com> wrote: >> Hi, >> >> Which revision number you are running? >> >> Can you see any log lines related to node being UP or dead? (like >> "InetAddress X.X.X.X is now dead" or "Node X.X.X.X has restarted, now >> UP again"). These messages come from the Gossiper and indicate if it >> for some reason thinks the node is dead. Level of these messages is >> info. >> >> Another thing is: can you see any log messages like "Node X.X.X.X >> state normal, token XXX"? These are on debug level. >> >> -Jaakko >> >> >> On Wed, Dec 23, 2009 at 12:59 AM, Ramzi Rabah <rra...@playdom.com> wrote: >>> I just recently upgraded to latest in 0.5 branch, and I am running >>> into a serious issue. I have a cluster with 4 nodes, rackunaware >>> strategy, and using my own tokens distributed evenly over the hash >>> space. I am writing/reading equally to them at an equal rate of about >>> 230 reads/writes per second(and cfstats shows that). The first 3 nodes >>> are seeds, the last one isn't. When I start all the nodes together at >>> the same time, they all receive equal amounts of reads/writes (about >>> 230). >>> When I bring node 4 down and bring it back up again, node 4's load >>> fluctuates between the 230 it used to get to sometimes no traffic at >>> all. The other 3 still have the same amount of traffic. And no errors >>> what so ever seen in logs. Any ideas what can be causing this >>> fluctuation on node 4 after I restarted it? >>> >> >