[
https://issues.apache.org/jira/browse/IGNITE-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-26680:
---------------------------------------
Description:
The following scenario is possible:
# Node is started when Metastorage has some revisions (for instance, the node
is restarted)
# It creates a Metastorage Raft node
# The Raft node gets all Metastorage revisions fast
# Metastorage recovery kicks in, it asks the leader for current revision
# It sees that current revision is already reached, so it completes the
Metastorage recovery future right away. This happens in a Raft-ReadOnly thread
(not in the state machine thread). This causes the recoveryRevisionsListener to
be nullified
# At the same time, a new revision comes from the leader and gets applied in
the state machine thread. It tries to invoke the recoveryRevisionsListener
As it can be seen, there is a race between nullification of the listener and
its invocation. Currently, the null-check and invocation both read the volatile
field (of the listener), so it is possible for the command execution to see
that the listener is not null and then attempt to dereference a null reference
set on step 5, getting a NullPointerException.
This can be easily fixed by first reading the field once and then doing both
the null check and invocation on the local variable.
The race still remains, but it's benign as a redundant listener invocation has
no effect (it just tries to complete the already completed future, which is
no-op).
> Potential NullPointerException on metastorage recovery
> ------------------------------------------------------
>
> Key: IGNITE-26680
> URL: https://issues.apache.org/jira/browse/IGNITE-26680
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: MakeTeamcityGreenAgain, ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The following scenario is possible:
> # Node is started when Metastorage has some revisions (for instance, the
> node is restarted)
> # It creates a Metastorage Raft node
> # The Raft node gets all Metastorage revisions fast
> # Metastorage recovery kicks in, it asks the leader for current revision
> # It sees that current revision is already reached, so it completes the
> Metastorage recovery future right away. This happens in a Raft-ReadOnly
> thread (not in the state machine thread). This causes the
> recoveryRevisionsListener to be nullified
> # At the same time, a new revision comes from the leader and gets applied in
> the state machine thread. It tries to invoke the recoveryRevisionsListener
> As it can be seen, there is a race between nullification of the listener and
> its invocation. Currently, the null-check and invocation both read the
> volatile field (of the listener), so it is possible for the command execution
> to see that the listener is not null and then attempt to dereference a null
> reference set on step 5, getting a NullPointerException.
> This can be easily fixed by first reading the field once and then doing both
> the null check and invocation on the local variable.
> The race still remains, but it's benign as a redundant listener invocation
> has no effect (it just tries to complete the already completed future, which
> is no-op).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)