Mikhail Petrov created IGNITE-15966:
---------------------------------------
Summary: [Security] Node can hang with authentication enabled
after user drop operation
Key: IGNITE-15966
URL: https://issues.apache.org/jira/browse/IGNITE-15966
Project: Ignite
Issue Type: Bug
Environment:
Reporter: Mikhail Petrov
Reproducer:
{code:java}
/** */
public class UserDropTest extends GridCommonAbstractTest {
/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
cfg.setAuthenticationEnabled(true);
cfg.setDataStorageConfiguration(new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(new DataRegionConfiguration()
.setPersistenceEnabled(true)));
return cfg;
}
/** */
@Test
public void test() throws Exception {
startGrid(0);
startGrid(1);
grid(0).cluster().state(ClusterState.ACTIVE);
grid(0).createCache(DEFAULT_CACHE_NAME);
try (AutoCloseable ignored =
withSecurityContextOnAllNodes(authenticate(grid(0), "ignite", "ignite"))) {
grid(0).context().security().createUser("cli", "pwd".toCharArray());
}
IgniteClient client = Ignition.startClient(new
ClientConfiguration().setAddresses("127.0.0.1:10800").setUserName("cli").setUserPassword("pwd"));
ClientCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME);
try (AutoCloseable ignored =
withSecurityContextOnAllNodes(authenticate(grid(0), "ignite", "ignite"))) {
grid(0).context().security().dropUser("cli");
}
Map<Integer, Integer> entries = new HashMap<>();
for (int i = 0; i < 10000; i++)
entries.put(i, i);
cache.putAll(entries);
}
/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();
cleanPersistenceDir();
}
}
{code}
Exception:
{code:java}
[2021-11-22
11:04:32,390][ERROR][sys-stripe-3-#92%ignite.UserDropTest1%][IgniteTestResources]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Failed to
find security context for subject with given ID :
0898b227-30d5-3afc-9394-d8e4889ece4a]]
java.lang.IllegalStateException: Failed to find security context for subject
with given ID : 0898b227-30d5-3afc-9394-d8e4889ece4a
at
org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:167)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1906)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.lang.Thread.run(Thread.java:748)
{code}
The main problem is:
Implementation of authentication plugin ties security user with the subject ID
that is propagated through cluster nodes.
If some node receives operation initiated by the deleted user, it fails to
obtain security context via subject id since it was deleted and hangs with
mentioned above exception.
Here we are faced with a security implementation problem - we have no mechanism
to determine that a security subject is no longer needed and can be safely
removed and at the same time we throw unrecoverable exception in case security
subject is not found that kills system worker and hangs node.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)