[
https://issues.apache.org/jira/browse/IGNITE-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-13643:
--------------------------------------
Priority: Major (was: Critical)
> Fix long closing of the socker in ServerImpl (TcpDiscoverySpi)
> --------------------------------------------------------------
>
> Key: IGNITE-13643
> URL: https://issues.apache.org/jira/browse/IGNITE-13643
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
>
> Current IgniteUtils.closeQuiet(@Nullable Socket sock) takes about 5sec to
> close socket. Probably it is default soTimeout. This violates node detection
> failure. Despite we set failureDetectionTiemout == 1000, node failure is
> detected within 6.5 secs in average. Logging shows delay on socket closing in
> IgniteUtils.closeQuiet(@Nullable Socket sock).
> This time gap was unearther by a discovery integration test on ducktape [1].
> Failure detection timeout is set to 1000ms.
> Typical results before the fix for 1 node:
> "Detection of node(s) failure (ms)": 6140, "All detection delays (ms):":
> "[6140]", "Nodes failed": 1}
> Typical results after the fix for 1 node:
> "Detection of node(s) failure (ms)": 1004, "All detection delays (ms):":
> "[1004]", "Nodes failed": 1}
> Suggestion: use forced closing, set soLinger=0, do now wait for rest of the
> socket IO. We close socket in TcpDiscoverySpi when we already waited for
> target timeouts and consider connection is lost or invalid. We do not need to
> wait for any traffic on the socket any more.
> There is note that 'graceful' socket closing was made to workaround bag in
> OpenJDK12 [1]. But as I see it has been fixed.
> [1]
> https://github.com/apache/ignite/blob/ignite-ducktape/modules/ducktests/tests/ignitetest/tests/discovery_test.py
> [2] https://bugs.openjdk.java.net/browse/JDK-8219658
--
This message was sent by Atlassian Jira
(v8.3.4#803005)