you can click "follow" on the jira issue to be notified of changes in its status
On Sun, Dec 27, 2009 at 8:15 PM, <mail.list.steel.men...@gmail.com> wrote: > Yes, it's seems IS the same problem, does it's has no any fix yet? > > ---------END---------- > > -----Original Message----- > From: Ramzi Rabah [mailto:rra...@playdom.com] > Sent: Monday, December 28, 2009 12:43 AM > To: cassandra-user@incubator.apache.org > Subject: Re: bug when node down-up?? > > I believe this is the same problem as > https://issues.apache.org/jira/browse/CASSANDRA-651 > > > > On Sun, Dec 27, 2009 at 7:38 AM, <mail.list.steel.men...@gmail.com> wrote: >> HI,guys: >> >> >> >> I probably found a bug, it’s seemed on-line cluster can’t resistant >> rebooting of single node, although it suppose to be. >> >> >> >> suppose a cluster contained 8 nodes, which contained about 10000 rows(key >> range from 1 to 10000): >> >> Address Status Load >> Range Ring >> >> >> 170141183460469231731687303715884105728 >> >> 10.237.4.85 Up 757.13 MB >> 21267647932558653966460912964485513216 |<--| >> >> 10.237.1.135 Up 761.54 MB >> 42535295865117307932921825928971026432 | ^ >> >> 10.237.1.137 Up 748.02 MB >> 63802943797675961899382738893456539648 v | >> >> 10.237.1.139 Up 732.36 MB >> 85070591730234615865843651857942052864 | ^ >> >> 10.237.1.140 Up 725.6 MB >> 106338239662793269832304564822427566080 v | >> >> 10.237.1.141 Up 726.59 MB >> 127605887595351923798765477786913079296 | ^ >> >> 10.237.1.143 Up 728.16 MB >> 148873535527910577765226390751398592512 v | >> >> 10.237.1.144 Up 745.69 MB >> 170141183460469231731687303715884105728 |-->| >> >> >> >> (1) Read keys range [1-10000], all keys read out ok ( client send read >> request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 ) >> >> (2) Turn-off 10.237.1.135 while remain pressure, some read request will >> time out, >> >> after all nodes know 10.237.1.135 has down (about 10 s later), all read >> request become ok again, that’s fine >> >> (3) After turn-on 10.237.1.135(and cassandra service, certainly), some >> read request will time out again, and will remain FOREVER even all nodes >> know 10.237.1.135 has up, >> >> That’s a PROBLEM! >> >> (4) Reboot 10.237.1.135, problem remains. >> >> (5) If stop pressure and reboot whole cluster then perform step 1, all >> things are fine, again….. >> >> >> >> All read request use Quorum policy, version of Cassandra is >> apache-cassandra-incubating-0.5.0-beta2, and I’ve tested >> apache-cassandra-incubating-0.5.0-RC1, problem remains. >> >> >> >> After read system.log, I found after 10.237.1.135 down and up again, other >> nodes will not establish tcp connection to it(on tcp port 7000 ) forever! >> >> And read request sent to 10.237.1.135(into Pending-Writes because socket >> channel is closed) will not sent to net forever(from observing tcpdump). >> >> >> >> It’s seems when 10.237.1.135 going down in step2, some socket channel was >> reset , >> >> after 10.237.1.135 come back, these socket channel remain closed, forever…., >> I don’t know…. >> >> >> >> Sorry for my poor English…, hope I’ve stated my problem clear. >> >> >> >> ---------END---------- >> >> > >