[
https://issues.apache.org/jira/browse/IGNITE-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071590#comment-17071590
]
Sergey Antonov commented on IGNITE-12845:
-----------------------------------------
[~alex_pl] I didn't find any {{Set#contains(Object)}} usages in
{{sun.nio.ch.SelectorImpl}} in jdk8 (1.8.0_191).
> GridNioServer can infinitely lose some events
> ----------------------------------------------
>
> Key: IGNITE-12845
> URL: https://issues.apache.org/jira/browse/IGNITE-12845
> Project: Ignite
> Issue Type: Bug
> Reporter: Aleksey Plekhanov
> Priority: Major
>
> With enabled optimization (IGNITE_NO_SELECTOR_OPTS = false, by default)
> {{GridNioServer}} can lose some events for a channel (depending on JDK
> version and OS). It can lead to connected applications hang. Reproducer:
> {code:java}
> public void testConcurrentLoad() throws Exception {
> startGrid(0);
> try (IgniteClient client = Ignition.startClient(new
> ClientConfiguration().setAddresses("127.0.0.1:10800"))) {
> ClientCache<Integer, Integer> cache =
> client.getOrCreateCache(DEFAULT_CACHE_NAME);
> GridTestUtils.runMultiThreaded(
> () -> {
> for (int i = 0; i < 1000; i++)
> cache.put(i, i);
> }, 5, "run-async");
> }
> }
> {code}
> This reproducer hangs eventually on MacOS (tested with JDK 8, 11, 12, 13,
> 14), hangs on some Linux environments (for example passed more than 100 times
> on desktop Linux system with JDK 8, but hangs on team-city agents with JDK 8,
> 11) and never hanged (passed more than 100 times) on windows system, but
> passes on all systems and JDK versions when system property
> {{IGNITE_NO_SELECTOR_OPTS = true}} is set.
>
> The root cause: optimized {{SelectedSelectionKeySet}} always returns
> {{false}} for {{contains()}} method. The {{contains()}} method used by
> {{sun.nio.ch.SelectorImpl.processReadyEvents()}} method:
> {code:java}
> if (selectedKeys.contains(ski)) {
> if (ski.translateAndUpdateReadyOps(rOps)) {
> return 1;
> }
> } else {
> ski.translateAndSetReadyOps(rOps);
> if ((ski.nioReadyOps() & ski.nioInterestOps()) != 0) {
> selectedKeys.add(ski);
> return 1;
> }
> }
> {code}
> So, for fair implementation, if a selection key is contained in the selected
> keys set, then ready operations flags are updated, but for
> {{SelectedSelectionKeySet}} ready operations flags will be always overridden
> and new selector key will be added even if it's already contained in the set.
> Some {{SelectorImpl}} implementations can pass several events for one
> selector key to {{processReadyEvents}} method (for example, MacOs
> implementation {{KQueueSelectorImpl}} works in such a way). In this case,
> duplicated selector keys will be added to {{selectedKeys}} and all events
> except last will be lost.
> Two bad things happen in {{GridNioServer}} due to described above reasons:
> # Some event flags are lost and the worker doesn't process corresponding
> action (for attached reproducer "channel is ready for reading" event is lost
> and the workers never read the channel after some point in time).
> # Duplicated selector keys with the same event flags (for attached
> reproducer it's "channel is ready for writing" event, this duplication leads
> to wrong processing of {{GridSelectorNioSessionImpl#procWrite}} flag, which
> will be {{false}} in some cases, but at the same time selector key's
> {{interestedOps}} will contain {{OP_WRITE}} operation and this operation
> never be excluded)
> Possible solutions:
> * Fair implementation of {{SelectedSelectionKeySet.contains}} method (this
> will solve all problems but can be resource consuming)
> * Always set {{GridSelectorNioSessionImpl#procWrite}} to {{true}} when
> adding {{OP_WRITE}} to {{interestedOps}} (for example in
> {{AbstractNioClientWorker.registerWrite()}} method). In this case, some
> "channel is ready for reading" events (but not data) still can be lost, but
> not infinitely, and eventually data will be read.
> * Exclude {{OP_WRITE}} from {{interestedOps}} even if
> {{GridSelectorNioSessionImpl#procWrite}} is {{false}} when there are no write
> requests in the queue (see {{GridNioServer.stopPollingForWrite()}} method).
> This solution has the same shortcomings as the previous one.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)