[ 
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9825:
--------------------------------
    Description: 
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
sender and locator and receiver use different configuration parameters. Set 
{{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the receiver. 
Oh and just skip the call to {{{}securityProperties(){}}}—we want to induce the 
"Unknown header byte" exception—we don't want the TLS framework throwing 
exceptions. See attached patch file GEODE-9825-demo.patch for an example.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

Before the changes for GEODE-9141 were introduced, the line of code referenced 
above used to be this snippet in {{Connection.compactOrResizeBuffer(int 
messageLength)}} (that method has since been removed):
{code:java}
     // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    } {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

But the code inside 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing something 
like:
{code:java}
newBuffer.clear();
newBuffer.put(existing);
newBuffer.flip();
releaseBuffer(type, existing);
return newBuffer; {code}
It's not clear to me, exactly, what the difference is between the old and new 
code. It's not sufficient to simply call {{flip()}} on the inputBuffer before 
returning it (I tried it and it didn't fix the bug). More work is needed.
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

The attached patch file GEODE-9825-demo.patch shows a quick hack to 
{{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.

  was:
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
sender and locator and receiver use different configuration parameters. Set 
{{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the receiver. 
Oh and just skip the call to {{{}securityProperties(){}}}—we want to induce the 
"Unknown header byte" exception—we don't want the TLS framework throwing 
exceptions. See attached patch file GEODE-9825-demo.patch for an example.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

Before the changes for GEODE-9141 were introduced, the line of code referenced 
above used to be this snippet in {{Connection.compactOrResizeBuffer(int 
messageLength)}} (that method has since been removed):
{code:java}
     // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    } {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

But the code inside {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded() 
}}is doing something like:
{code:java}
newBuffer.clear();
newBuffer.put(existing);
newBuffer.flip();
releaseBuffer(type, existing);
return newBuffer; {code}
It's not clear to me, exactly, what the difference is between the old and new 
code. It's not sufficient to simply call {{flip()}} on the inputBuffer before 
returning it (I tried it and it didn't fix the bug). More work is needed.
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

The attached patch file GEODE-9825-demo.patch shows a quick hack to 
{{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.


> Disparate socket-buffer-size Results in "IOException: Unknown header byte" 
> and Hangs
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-9825
>                 URL: https://issues.apache.org/jira/browse/GEODE-9825
>             Project: Geode
>          Issue Type: Bug
>          Components: messaging
>    Affects Versions: 1.12.4, 1.15.0
>            Reporter: Bill Burcham
>            Priority: Major
>         Attachments: GEODE-9825-demo.patch
>
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
> byte..."}} and hangs if members are configured with different 
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
> sender and locator and receiver use different configuration parameters. Set 
> {{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the 
> receiver. Oh and just skip the call to {{{}securityProperties(){}}}—we want 
> to induce the "Unknown header byte" exception—we don't want the TLS framework 
> throwing exceptions. See attached patch file GEODE-9825-demo.patch for an 
> example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
> messages it can from the current input buffer, it then considers whether the 
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to 
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave 
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be 
> _read_ not written.
> Before the changes for GEODE-9141 were introduced, the line of code 
> referenced above used to be this snippet in 
> {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has 
> since been removed):
> {code:java}
>      // need a bigger buffer
>     logger.info("Allocating larger network read buffer, new size is {} old 
> size was {}.",
>         allocSize, oldBufferSize);
>     ByteBuffer oldBuffer = inputBuffer;
>     inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
>     if (oldBuffer != null) {
>       int oldByteCount = oldBuffer.remaining();
>       inputBuffer.put(oldBuffer);
>       inputBuffer.position(oldByteCount);
>       getBufferPool().releaseReceiveBuffer(oldBuffer);
>     } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> But the code inside 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing 
> something like:
> {code:java}
> newBuffer.clear();
> newBuffer.put(existing);
> newBuffer.flip();
> releaseBuffer(type, existing);
> return newBuffer; {code}
> It's not clear to me, exactly, what the difference is between the old and new 
> code. It's not sufficient to simply call {{flip()}} on the inputBuffer before 
> returning it (I tried it and it didn't fix the bug). More work is needed.
> h2. Resolution
> When this ticket is complete the bug will be fixed and 
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)]  this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're 
> at it (the test currently holds it at true).
> The attached patch file GEODE-9825-demo.patch shows a quick hack to 
> {{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to