[ 
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9825:
--------------------------------
    Description: 
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug turn off TLS and set socket-buffer-size on sender to be 
64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for 
an example.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

Before the changes for GEODE-9141 were introduced, the line of code referenced 
above used to be this snippet in {{Connection.compactOrResizeBuffer(int 
messageLength)}} (that method has since been removed):
{code:java}
     // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    } {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

But the code inside 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing something 
like:
{code:java}
newBuffer.clear();
newBuffer.put(existing);
newBuffer.flip();
releaseBuffer(type, existing);
return newBuffer; {code}
A solution (shown in the associated PR) is to do add logic after the call to 
{{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ 
state:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize);
// we're returning to the caller (done == true) so make buffer writeable
inputBuffer.position(inputBuffer.limit());
inputBuffer.limit(inputBuffer.capacity()); {code}
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

  was:
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
sender and locator and receiver use different configuration parameters. Set 
{{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the receiver. 
Oh and just skip the call to {{{}securityProperties(){}}}—we want to induce the 
"Unknown header byte" exception—we don't want the TLS framework throwing 
exceptions. See attached patch file GEODE-9825-demo.patch for an example.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

Before the changes for GEODE-9141 were introduced, the line of code referenced 
above used to be this snippet in {{Connection.compactOrResizeBuffer(int 
messageLength)}} (that method has since been removed):
{code:java}
     // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    } {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

But the code inside 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing something 
like:
{code:java}
newBuffer.clear();
newBuffer.put(existing);
newBuffer.flip();
releaseBuffer(type, existing);
return newBuffer; {code}
The solution (shown in the attached patch file GEODE-9825-demo.patch) is to do 
add logic after the call to {{expandReadBufferIfNeeded(allocSize)}} to leave 
the buffer in a _writeable_ state:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize);
// we're returning to the caller (done == true) so make buffer writeable
inputBuffer.position(inputBuffer.limit());
inputBuffer.limit(inputBuffer.capacity()); {code}
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

The attached patch file GEODE-9825-demo.patch shows a quick hack to 
{{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug. The patch also 
includes a fix.


> Disparate socket-buffer-size Results in "IOException: Unknown header byte" 
> and Hangs
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-9825
>                 URL: https://issues.apache.org/jira/browse/GEODE-9825
>             Project: Geode
>          Issue Type: Bug
>          Components: messaging
>    Affects Versions: 1.12.4, 1.15.0
>            Reporter: Bill Burcham
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: GEODE-9825-demo.patch
>
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
> byte..."}} and hangs if members are configured with different 
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug turn off TLS and set socket-buffer-size on sender to be 
> 64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for 
> an example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
> messages it can from the current input buffer, it then considers whether the 
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to 
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave 
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be 
> _read_ not written.
> Before the changes for GEODE-9141 were introduced, the line of code 
> referenced above used to be this snippet in 
> {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has 
> since been removed):
> {code:java}
>      // need a bigger buffer
>     logger.info("Allocating larger network read buffer, new size is {} old 
> size was {}.",
>         allocSize, oldBufferSize);
>     ByteBuffer oldBuffer = inputBuffer;
>     inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
>     if (oldBuffer != null) {
>       int oldByteCount = oldBuffer.remaining();
>       inputBuffer.put(oldBuffer);
>       inputBuffer.position(oldByteCount);
>       getBufferPool().releaseReceiveBuffer(oldBuffer);
>     } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> But the code inside 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing 
> something like:
> {code:java}
> newBuffer.clear();
> newBuffer.put(existing);
> newBuffer.flip();
> releaseBuffer(type, existing);
> return newBuffer; {code}
> A solution (shown in the associated PR) is to do add logic after the call to 
> {{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ 
> state:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize);
> // we're returning to the caller (done == true) so make buffer writeable
> inputBuffer.position(inputBuffer.limit());
> inputBuffer.limit(inputBuffer.capacity()); {code}
> h2. Resolution
> When this ticket is complete the bug will be fixed and 
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these 
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)]  this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're 
> at it (the test currently holds it at true).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to