[ 
https://issues.apache.org/jira/browse/IGNITE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-15767:
-----------------------------------
    Description: 
See [https://bugs.openjdk.java.net/browse/JDK-8247750]

ServerSocket.accept() with no timeout may throw SocketTimeoutException when the 
process receives a signal. It can cause unexpected exception in the disco 
reader:
{noformat}
[09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi] Failed 
to accept TCP 
connection.[09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi]
 Failed to accept TCP connection.java.net.SocketTimeoutException: Accept timed 
out at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
 at 
java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
 at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
 at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
 at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6750)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
 at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6673)
 at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:57){noformat}

 There are mentions of this on StackOverflow and userlist, and we also see it 
when running Docker on Alpine 3.15. Speculation is that this is caused by a 
combination of a specific version of Linux kernel and environment.

Sidenote: based on strace analysis on Alpine, it doesn't even receive any 
signals; it could be so that Alpine interrupts the syscall "as if" by a signal.

The bug is not fixed in JDK 11 (surprisingly). WA is easy though - wrap the 
accept in try-catch, and retry if getting the unexpected timeout exception.

  was:
See [https://bugs.openjdk.java.net/browse/JDK-8247750]

ServerSocket.accept() with no timeout may throw SocketTimeoutException when the 
process receives a signal. It can cause unexpected exception in the disco 
reader:
{noformat}
[09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi] Failed 
to accept TCP 
connection.[09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi]
 Failed to accept TCP connection.java.net.SocketTimeoutException: Accept timed 
out at java.base/java.net.PlainSocketImpl.socketAccept(Native Method) at 
java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
 at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565) at 
java.base/java.net.ServerSocket.accept(ServerSocket.java:533) at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6750)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) 
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6673)
 at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:57){noformat}
{{}}
There are mentions of this on StackOverflow and userlist, and we also see it 
when running Docker on Alpine 3.15. Speculation is that this is caused by a 
combination of a specific version of Linux kernel and environment.

Sidenote: based on strace analysis on Alpine, it doesn't even receive any 
signals; it could be so that Alpine interrupts the syscall "as if" by a signal.

The bug is not fixed in JDK 11 (surprisingly). WA is easy though - wrap the 
accept in try-catch, and retry if getting the unexpected timeout exception.


> Need to workaround JDK bug JDK-8247750
> --------------------------------------
>
>                 Key: IGNITE-15767
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15767
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>             Fix For: 2.13
>
>
> See [https://bugs.openjdk.java.net/browse/JDK-8247750]
> ServerSocket.accept() with no timeout may throw SocketTimeoutException when 
> the process receives a signal. It can cause unexpected exception in the disco 
> reader:
> {noformat}
> [09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi] 
> Failed to accept TCP 
> connection.[09:52:26,301][SEVERE][tcp-disco-srvr-[:47500]-#3-#71][TcpDiscoverySpi]
>  Failed to accept TCP connection.java.net.SocketTimeoutException: Accept 
> timed out at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
>  at 
> java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
>  at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
>  at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6750)
>  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6673)
>  at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:57){noformat}
>  There are mentions of this on StackOverflow and userlist, and we also see it 
> when running Docker on Alpine 3.15. Speculation is that this is caused by a 
> combination of a specific version of Linux kernel and environment.
> Sidenote: based on strace analysis on Alpine, it doesn't even receive any 
> signals; it could be so that Alpine interrupts the syscall "as if" by a 
> signal.
> The bug is not fixed in JDK 11 (surprisingly). WA is easy though - wrap the 
> accept in try-catch, and retry if getting the unexpected timeout exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to