[ 
https://issues.apache.org/jira/browse/GEODE-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-8020:
--------------------------------
    Description: 
update: May 8, 2020: the main problem described here seemed to only occur on 
JDK8 when TLSv1 is used. JDK11 with TLSv1 doesn't exhibit the problem. Nor is 
the problem apparent when TLSv1.2 is used on either JDK. This issue is marked 
resolved but the problem still occurs on JDK8 with TLSv1. Recommend customers 
use TLSv1.2 or later.  Other buffering problems were found in this 
investigation and a PR was merged to address those.

When running an application with SSL enabled I ran into a hang with a lost 
message.  The sender had a 15 second ack-wait warning pointing to another 
server in the cluster.  That server had this in its log file at the time the 
message would have been processed:

{noformat}
[info 2020/04/21 11:22:39.437 PDT <P2P message reader for 
rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003
 unshared ordered uid=354 dom #2 port=55262> tid=0xad] P2P message 
reader@2580db5f io exception for 
rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003@354(GEODE
 1.10.0)
javax.net.ssl.SSLException: bad record MAC
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:214)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:986)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:912)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:782)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626)
        at 
org.apache.geode.internal.net.NioSslEngine.unwrap(NioSslEngine.java:275)
        at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2894)
        at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
        at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: javax.crypto.BadPaddingException: bad record MAC
        at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219)
        at 
sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:979)
        ... 10 more
{noformat}

I bisected to see when this problem was introduced and found it was this commit:

{noformat}
commit 418d929e3e03185cd6330c828c9b9ed395a76d4b
Author: Mario Ivanac <48509724+miva...@users.noreply.github.com>
Date:   Fri Nov 1 20:28:57 2019 +0100

    GEODE-6661: Fixed use of Direct and Non-Direct buffers (#4267)

    - Fixed use of Direct and Non-Direct buffers
{noformat}

That commit modified the NioSSLEngine to use a "direct" byte buffer instead of 
a heap byte buffer.  If I revert that one part of the PR the test works okay.


  was:
When running an application with SSL enabled I ran into a hang with a lost 
message.  The sender had a 15 second ack-wait warning pointing to another 
server in the cluster.  That server had this in its log file at the time the 
message would have been processed:

{noformat}
[info 2020/04/21 11:22:39.437 PDT <P2P message reader for 
rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003
 unshared ordered uid=354 dom #2 port=55262> tid=0xad] P2P message 
reader@2580db5f io exception for 
rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003@354(GEODE
 1.10.0)
javax.net.ssl.SSLException: bad record MAC
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:214)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:986)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:912)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:782)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626)
        at 
org.apache.geode.internal.net.NioSslEngine.unwrap(NioSslEngine.java:275)
        at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2894)
        at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
        at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: javax.crypto.BadPaddingException: bad record MAC
        at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219)
        at 
sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:979)
        ... 10 more
{noformat}

I bisected to see when this problem was introduced and found it was this commit:

{noformat}
commit 418d929e3e03185cd6330c828c9b9ed395a76d4b
Author: Mario Ivanac <48509724+miva...@users.noreply.github.com>
Date:   Fri Nov 1 20:28:57 2019 +0100

    GEODE-6661: Fixed use of Direct and Non-Direct buffers (#4267)

    - Fixed use of Direct and Non-Direct buffers
{noformat}

That commit modified the NioSSLEngine to use a "direct" byte buffer instead of 
a heap byte buffer.  If I revert that one part of the PR the test works okay.



> buffer corruption in SSL communications
> ---------------------------------------
>
>                 Key: GEODE-8020
>                 URL: https://issues.apache.org/jira/browse/GEODE-8020
>             Project: Geode
>          Issue Type: Bug
>          Components: membership, messaging
>            Reporter: Bruce J Schuchardt
>            Assignee: Bruce J Schuchardt
>            Priority: Major
>             Fix For: 1.14.0
>
>
> update: May 8, 2020: the main problem described here seemed to only occur on 
> JDK8 when TLSv1 is used. JDK11 with TLSv1 doesn't exhibit the problem. Nor is 
> the problem apparent when TLSv1.2 is used on either JDK. This issue is marked 
> resolved but the problem still occurs on JDK8 with TLSv1. Recommend customers 
> use TLSv1.2 or later.  Other buffering problems were found in this 
> investigation and a PR was merged to address those.
> When running an application with SSL enabled I ran into a hang with a lost 
> message.  The sender had a 15 second ack-wait warning pointing to another 
> server in the cluster.  That server had this in its log file at the time the 
> message would have been processed:
> {noformat}
> [info 2020/04/21 11:22:39.437 PDT <P2P message reader for 
> rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003
>  unshared ordered uid=354 dom #2 port=55262> tid=0xad] P2P message 
> reader@2580db5f io exception for 
> rs-bschuchardt-1053-hydra-client-1(bridgegemfire4_host1_12599:12599)<ec><v1>:41003@354(GEODE
>  1.10.0)
> javax.net.ssl.SSLException: bad record MAC
>       at sun.security.ssl.Alerts.getSSLException(Alerts.java:214)
>       at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
>       at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:986)
>       at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:912)
>       at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:782)
>       at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626)
>       at 
> org.apache.geode.internal.net.NioSslEngine.unwrap(NioSslEngine.java:275)
>       at 
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2894)
>       at 
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
>       at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.crypto.BadPaddingException: bad record MAC
>       at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219)
>       at 
> sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177)
>       at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:979)
>       ... 10 more
> {noformat}
> I bisected to see when this problem was introduced and found it was this 
> commit:
> {noformat}
> commit 418d929e3e03185cd6330c828c9b9ed395a76d4b
> Author: Mario Ivanac <48509724+miva...@users.noreply.github.com>
> Date:   Fri Nov 1 20:28:57 2019 +0100
>     GEODE-6661: Fixed use of Direct and Non-Direct buffers (#4267)
>     - Fixed use of Direct and Non-Direct buffers
> {noformat}
> That commit modified the NioSSLEngine to use a "direct" byte buffer instead 
> of a heap byte buffer.  If I revert that one part of the PR the test works 
> okay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to