[ 
https://issues.apache.org/jira/browse/CASSANDRA-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366661#comment-17366661
 ] 

Stefan Miklosovic commented on CASSANDRA-16518:
-----------------------------------------------

I was not able to reproduce this on 3.11.10, if you check the source of these 
messages, it is here in DynamicLimit which extends ConfiguredLimit in its 
private method "maybeUpdateVersion":

{code}
        private void maybeUpdateVersion(boolean allowLowering)
        {
            boolean enforceV3Cap = SystemKeyspace.loadPeerVersions()
                                                 .values()
                                                 .stream()
                                                 .anyMatch(v -> 
v.compareTo(MIN_VERSION_FOR_V4) < 0);

            if (!enforceV3Cap)
            {
                maxVersion = ProtocolVersion.MAX_SUPPORTED_VERSION;
                return;
            }

            if (ProtocolVersion.V3.isSmallerThan(maxVersion) && !allowLowering)
            {
                logger.info("Detected peers which do not fully support protocol 
V4, but V4 was previously negotiable. " +
                            "Not enforcing cap as this can cause issues for 
older client versions. After the next " +
                            "restart the server will apply the cap");
                return;
            }

            logger.info("Detected peers which do not fully support protocol V4. 
Capping max negotiable version to V3");
            maxVersion = ProtocolVersion.V3;
        }
    }
{code}

So in order to get to the places where the first message is logged, 
"enforceV3Cap" has to be true so if (!enforceV3Cap) is false and it is skipped. 
Then, allowLowering has to be false. That means that maybeUpdateVersion accepts 
allowLowering as false. That is ever happening in DynamicLimit's method

{code}
        public void updateMaxSupportedVersion()
        {
            maybeUpdateVersion(false);
        }
{code}

updateMaxSupportedVersion is ever called only in NativeTransportService:

{code}
    public void refreshMaxNegotiableProtocolVersion()
    {
        // lowering the max negotiable protocol version is only safe if we 
haven't already
        // allowed clients to connect with a higher version. This still allows 
the max
        // version to be raised, as that is safe.
        if (initialized)
            protocolVersionLimit.updateMaxSupportedVersion();
    }
{code}

That is called in CassandraDaemon:

{code}
    public void refreshMaxNativeProtocolVersion()
    {
        if (nativeTransportService != null)
            nativeTransportService.refreshMaxNegotiableProtocolVersion();
    }
{code}

That is called in StorageService:

{code}
    private void refreshMaxNativeProtocolVersion()
    {
        if (daemon != null)
        {
            daemon.refreshMaxNativeProtocolVersion();
        }
    }
{code}

Now this is finally called at two places:

1) In StorageService#onChange where a respective node gets info from Gossip or 
so:

{code}
    case RELEASE_VERSION:
        SystemKeyspace.updatePeerReleaseVersion(endpoint, value.value, 
this::refreshMaxNativeProtocolVersion, executor);
{code}

The other place is in StorageService#updatePeerInfo which is called from 
StorageService#handleStateNormal so that treats the case when a node enters 
NORMAL state.

When you take the first place into account for now, lets see what it is doing:

{code}
    public static void updatePeerReleaseVersion(final InetAddress ep, final 
Object value, Runnable postUpdateTask, ExecutorService executorService)
    {
        if (ep.equals(FBUtilities.getBroadcastAddress()))
            return;

        String req = "INSERT INTO system.%s (peer, release_version) VALUES (?, 
?)";
        executorService.execute(() -> {
            executeInternal(String.format(req, PEERS), ep, value);
            postUpdateTask.run();
        });
    }
{code}

So, if the peer is not myself, insert the release version of that peer into 
system.peers AND AFTER THAT run the post update task, which happens to be our 
refreshMaxNativeProtocolVersion method.

So as I said before, the fact that it would get so far to actually log that 
first message means that enforceV3Cap has to be true in the first place, so the 
result of this has to be true:

{code}
    boolean enforceV3Cap = SystemKeyspace.loadPeerVersions()
        .values()
        .stream()
        .anyMatch(v -> v.compareTo(MIN_VERSION_FOR_V4) < 0);
{code}

Hence this clearly means that some version found in peers table has to be lower 
than MIN_VERSION_FOR_V4 which is:

static final CassandraVersion MIN_VERSION_FOR_V4 = new 
CassandraVersion("3.0.0");

But I wonder how is that even possible you get that version (hypothetically 
lower than 3.0.0) because you said that you are running on a cluster which has 
all nodes of same version, so lets dig deeper a bit:

{code}
    /**
     * Return a map of IP address to C* version. If an invalid version string, 
or no version
     * at all is stored for a given peer IP, then NULL_VERSION will be reported 
for that peer
     */
    public static Map<InetAddress, CassandraVersion> loadPeerVersions()
    {
        Map<InetAddress, CassandraVersion> releaseVersionMap = new HashMap<>();
        for (UntypedResultSet.Row row : executeInternal("SELECT peer, 
release_version FROM system." + PEERS))
        {
            InetAddress peer = row.getInetAddress("peer");
            if (row.has("release_version"))
            {
                try
                {
                    releaseVersionMap.put(peer, new 
CassandraVersion(row.getString("release_version")));
                }
                catch (IllegalArgumentException e)
                {
                    logger.info("Invalid version string found for {}", peer);
                    releaseVersionMap.put(peer, NULL_VERSION);
                }
            }
            else
            {
                logger.info("No version string found for {}", peer);
                releaseVersionMap.put(peer, NULL_VERSION);
            }
        }
        return releaseVersionMap;
    }
{code}

So if all nodes are of a bigger release than 3.0.0, the only place where we 
would get versoin lower than 3.0.0 is, as you debugged, the returned 
NULL_VERSION.

But I am failing to see how is that possible because if we recall what 
updatePeerReleaseVersion does:

{code}
        executorService.execute(() -> {
            executeInternal(String.format(req, PEERS), ep, value);
            postUpdateTask.run();
        });
{code}

So we insert the record into system.peers but we fail to read it back? How is 
that possible? Any ideas?

This might eventually mean that we insert NULL into DB, right ... so it means 
that onChange was invoked with RELEASE_VERSION state but its 
VersionedValue.value returns null which is even more interesting.

To get the second log message (logger.info("Detected peers which do not fully 
support protocol V4. Capping max negotiable version to V3");), for that, 
allowFiltering has to be true and that is called only in DynamicLimit's 
constructor which is initialised when a node starts - hence likely upon its 
start and it does not receive any state changes from the other nodes so 
system.peers has to contain that invalid value already.

Hence it _seems_ like a joining node propagates is release version with null 
when it is joining or a restarted node upon its start receives a 
RELEASE_VERSION which is null while a node is joining.


> Node restart during joining sets protocol version to V3
> -------------------------------------------------------
>
>                 Key: CASSANDRA-16518
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16518
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Joseph Clay
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 3.11.x
>
>
> While joining nodes to a cluster, an old node crashed. The old node was 
> recovered however clients (datastax java) refused to connect to it.
> The driver error:
> {noformat}
> Detected added or restarted Cassandra host /<ip>:<port> but ignoring it since 
> it does not support the version V4 of the native protocol which is currently 
> in use.{noformat}
> In the recovered node cassandra logs:
> {noformat}
> INFO  o.a.c.transport.ConfiguredLimit Detected peers which do not fully 
> support protocol V4. Capping max negotiable version to V3{noformat}
> I confirmed that ALL the nodes in the cluster, joining or otherwise, were 
> apache-cassandra-3.11.6 so that error message was rather confusing.
>  Eventually after digging through the code we got to the bottom of the issue:
> https://issues.apache.org/jira/browse/CASSANDRA-15193 adds a check for node 
> version, which reverts the protocol version to V3 if any peer fails the 
> version check. Joining nodes have NULL for their version in the peers table, 
> which fails the version check.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to