Thanks Aaron,

Telnet works (in both directions).

After a normal (i.e. without discarding ring state) restart of the node
reporting the other one as down, the ring shows "up" again. So a node
restarts fixes the incorrect state.

I see this error occasionally.

I will further investigate and post more details when it happens again.

2012/10/18 aaron morton <aa...@thelastpickle.com>

> You can double check the node reporting 9.109 as down can telnet to port
> 7000 on 9.109.
>
> Then I would restart 9.109 with -Dcassandra.load_ring_state=false added as
> a JVM param in cassandra-env.sh.
>
> If is still shows as down can you post the output from nodetool gossipinfo
> from 9.109 and the node that sees 9.109 as down.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/10/2012, at 8:45 PM, Rene Kochen <rene.koc...@schange.com> wrote:
>
> I have a four node EC2 cluster.
>
> Three machines show via nodetool ring that all machines are UP.
> One machine shows via nodetool ring that one machine is DOWN.
>
> If I take a closer to the machine reporting the other machine as down, I
> see the following:
>
> - StorageService.UnreachableNodes = 10.49.9.109
> - FailureDetector.SimpleStates: 10.49.9.109 = UP
>
> So gossip is fine. Actually the whole 10.49.9.109 machine is fine. I see
> in the logging that there is communication between 10.49.9.109 and the
> machine reporting it as down.
>
> How or when is a node removed from the UnreachableNodes list and reported
> as UP again via nodetool ring?
>
> I use Cassandra 1.0.11
>
> Thanks!
>
> Rene
>
>
>

Reply via email to