[
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel Voronkin updated IGNITE-11288:
------------------------------------
Description:
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
//we create socket with soTimeout(0) here, but setting it here won't help
anyway.
RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close()
onTimeout hangs on writeLock.
According to java8 SSLSocketImpl:
{code}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();
try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try
{ this.writeRecordInternal(var1, var2); }
finally \\{ this.writeLock.unlock(); }
} else
\\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify
message cannot be sent."); if (this.isLayered() && !this.autoClose) {
this.fatal((byte)-1, (Throwable)var4); }
else if (debug != null && Debug.isOn("ssl")) \\{
System.out.println(Thread.currentThread().getName() + ", received Exception: "
+ var4); }
this.sess.invalidate();
}
} catch (InterruptedException var14) \\{ var3 = true; }
if (var3) \\{ Thread.currentThread().interrupt(); }
} else
\\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
finally
{ this.writeLock.unlock(); }
}
\{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
Solution:
1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets
using iptables .
2) Set SO_LINGER to some reasonable positive value.
Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
was:
Rootcause is we not set SO_TIMEOUT on discovery socket on retry:
RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs
on writeLock.
According to java8 SSLSocketImpl:
{code}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();
try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try
{ this.writeRecordInternal(var1, var2); }
finally \{ this.writeLock.unlock(); }
} else
\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify
message cannot be sent."); if (this.isLayered() && !this.autoClose) {
this.fatal((byte)-1, (Throwable)var4); }
else if (debug != null && Debug.isOn("ssl")) \{
System.out.println(Thread.currentThread().getName() + ", received Exception: "
+ var4); }
this.sess.invalidate();
}
} catch (InterruptedException var14) \{ var3 = true; }
if (var3) \{ Thread.currentThread().interrupt(); }
} else
\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
finally
{ this.writeLock.unlock(); }
}
{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
Solution:
1) Set proper SO_TIMEOUT
2) Possibly add ability to override SO_LINGER to some reasonable value.
Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
> TcpDiscovery deadlock on SSLSocket.close().
> -------------------------------------------
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
> Issue Type: Bug
> Reporter: Pavel Voronkin
> Assignee: Pavel Voronkin
> Priority: Critical
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help
> anyway.
> RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close()
> onTimeout hangs on writeLock.
> According to java8 SSLSocketImpl:
> {code}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
>
> finally \\{ this.writeLock.unlock(); }
> } else
>
> \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify
> message cannot be sent."); if (this.isLayered() && !this.autoClose) {
> this.fatal((byte)-1, (Throwable)var4); }
>
> else if (debug != null && Debug.isOn("ssl")) \\{
> System.out.println(Thread.currentThread().getName() + ", received Exception:
> " + var4); }
>
> this.sess.invalidate();
> }
> } catch (InterruptedException var14) \\{ var3 = true; }
>
> if (var3) \\{ Thread.currentThread().interrupt(); }
> } else
>
> \\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
> \{code}
> In case of soLinger is not set we fallback to this.writeLock.lock(); which
> wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
> Solution:
> 1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop
> packets using iptables .
> 2) Set SO_LINGER to some reasonable positive value.
> Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)