On Wed, Jun 14, 2017 at 9:11 AM, wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>
wrote:

> Hi,
> Cluster set up:
> 1 DC with 5 nodes (each node having 700GB data)
> 1 kespace with RF of 2
> write CL is LOCAL_ONE
> read CL is LOCAL_QUORUM
>
> One node was down for about 1 hour because of OOM issue. During the down
> period, all 4 other nodes report "Cannot achieve consistency
> level LOCAL_ONE" constantly until I brought up the dead node. My data
> seems lost during that down time. To me this could not happen because the
> write CL is LOCAL_ONE and only one node was dead. I encountered node down
> before because of OOM issue and I believe I didn't lose data because of the
> hinted handoff feature.
>

Hi,

The problem here is at a different level: not a single replica of the data
could be written because no coordinator was available to serve the
(authentication, see below) request.

One more thing, the dead node was added recently and the only difference is
> the other 4 nodes are behind an internal SLB(Service Load Balance) with
> VIP, and the new one not.
> Our application access Casssandra cluster by the SLB VIP.
>
> Any thoughts are appreciated.
>
> Best regards,
> -Simon
>
> System log:
> 57659 Caused by: com.google.common.util.concurrent.
> UncheckedExecutionException: java.lang.RuntimeException:
> org.apache.cassandra.exceptions.Unavai        lableException: Cannot
> achieve consistency level LOCAL_ONE
>   57660         at com.google.common.cache.LocalCache$
> Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na]
>   57661         at com.google.common.cache.LocalCache.get(
> LocalCache.java:3934) ~[guava-16.0.jar:na]
>   57662         at com.google.common.cache.LocalCache.
> getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na]
>   57663         at com.google.common.cache.LocalCache$
> LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na]
>   57664         at org.apache.cassandra.auth.RolesCache.
> getRoles(RolesCache.java:70) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57665         at org.apache.cassandra.auth.Roles.
> hasSuperuserStatus(Roles.java:51) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57666         at org.apache.cassandra.auth.AuthenticatedUser.isSuper(
> AuthenticatedUser.java:71) ~[apache-cassandra-2.2.8.jar:2.2.8]
>   57667         at org.apache.cassandra.auth.
> CassandraAuthorizer.authorize(CassandraAuthorizer.java:76) ~
> [apache-cassandra-2.2.8.jar:2.2.8]
>

What are the replication settings of your system_auth keyspace?  It looks
like the node being down was responsible for the only replica of the user
info needed to check its credentials/permissions.

Cheers,
--
Alex

Reply via email to