[ 
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9825:
--------------------------------
    Description: 
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
sender and locator and receiver use different configuration parameters. Set 
{{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the receiver. 
Oh and just skip the call to {{{}securityProperties(){}}}—we want to induce the 
"Unknown header byte" exception—we don't want the TLS framework throwing 
exceptions. See attached patch file GEODE-9825-demo.patch for an example.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

The line of code referenced above used to be this method in {{Connection}} 
(which has since been removed):
{code:java}
private void compactOrResizeBuffer(int messageLength) {
  final int oldBufferSize = inputBuffer.capacity();
  int allocSize = messageLength + MSG_HEADER_BYTES;
  if (oldBufferSize < allocSize) {
    // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);

    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    }
  } else {
    if (inputBuffer.position() != 0) {
      inputBuffer.compact();
    } else {
      inputBuffer.position(inputBuffer.limit());
      inputBuffer.limit(inputBuffer.capacity());
    }
  }
} {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

It's not sufficient to simply call {{flip()}} on the inputBuffer before 
returning it (I tried it and it didn't fix the bug). More work is needed.
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

The attached patch file GEODE-9825-demo.patch shows a quick hack to 
{{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.

  was:
GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
byte..."}} and hangs if members are configured with different 
{{socket-buffer-size}} settings.
h2. Reproduction

To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
sender and locator and receiver use different configuration parameters. Set 
{{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the receiver. 
Oh and just skip the call to {{{}securityProperties(){}}}—we want to induce the 
"Unknown header byte" exception—we don't want the TLS framework throwing 
exceptions.
h2. Analysis

In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
messages it can from the current input buffer, it then considers whether the 
buffer needs expansion. If it does then:
{code:java}
inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
Is executed and the method returns. The caller then expects to be able to 
_write_ bytes into {{{}inputBuffer{}}}.

The problem, it seems, is that 
{{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave the 
the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be _read_ 
not written.

The line of code referenced above used to be this method in {{Connection}} 
(which has since been removed):
{code:java}
private void compactOrResizeBuffer(int messageLength) {
  final int oldBufferSize = inputBuffer.capacity();
  int allocSize = messageLength + MSG_HEADER_BYTES;
  if (oldBufferSize < allocSize) {
    // need a bigger buffer
    logger.info("Allocating larger network read buffer, new size is {} old size 
was {}.",
        allocSize, oldBufferSize);
    ByteBuffer oldBuffer = inputBuffer;
    inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);

    if (oldBuffer != null) {
      int oldByteCount = oldBuffer.remaining();
      inputBuffer.put(oldBuffer);
      inputBuffer.position(oldByteCount);
      getBufferPool().releaseReceiveBuffer(oldBuffer);
    }
  } else {
    if (inputBuffer.position() != 0) {
      inputBuffer.compact();
    } else {
      inputBuffer.position(inputBuffer.limit());
      inputBuffer.limit(inputBuffer.capacity());
    }
  }
} {code}
Notice how this method leaves {{inputBuffer}} ready to be _written_ to.

It's not sufficient to simply call {{flip()}} on the inputBuffer before 
returning it (I tried it and it didn't fix the bug). More work is needed.
h2. Resolution

When this ticket is complete the bug will be fixed and 
{{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
combinations:

[security, sender/locator socket-buffer-size, receiver socket-buffer-size]

[TLS, (default), (default)]  this is what the test currently does
[no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
[no TLS, (default), (default)] *new*

We might want to mix in conserve-sockets true/false in there too while we're at 
it (the test currently holds it at true).

The attached patch file GEODE-9825-demo.patch shows a quick hack to 
{{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.


> Disparate socket-buffer-size Results in "IOException: Unknown header byte" 
> and Hangs
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-9825
>                 URL: https://issues.apache.org/jira/browse/GEODE-9825
>             Project: Geode
>          Issue Type: Bug
>          Components: messaging
>    Affects Versions: 1.12.4, 1.15.0
>            Reporter: Bill Burcham
>            Priority: Major
>         Attachments: GEODE-9825-demo.patch
>
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
> byte..."}} and hangs if members are configured with different 
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that 
> sender and locator and receiver use different configuration parameters. Set 
> {{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the 
> receiver. Oh and just skip the call to {{{}securityProperties(){}}}—we want 
> to induce the "Unknown header byte" exception—we don't want the TLS framework 
> throwing exceptions. See attached patch file GEODE-9825-demo.patch for an 
> example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
> messages it can from the current input buffer, it then considers whether the 
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to 
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave 
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be 
> _read_ not written.
> The line of code referenced above used to be this method in {{Connection}} 
> (which has since been removed):
> {code:java}
> private void compactOrResizeBuffer(int messageLength) {
>   final int oldBufferSize = inputBuffer.capacity();
>   int allocSize = messageLength + MSG_HEADER_BYTES;
>   if (oldBufferSize < allocSize) {
>     // need a bigger buffer
>     logger.info("Allocating larger network read buffer, new size is {} old 
> size was {}.",
>         allocSize, oldBufferSize);
>     ByteBuffer oldBuffer = inputBuffer;
>     inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);
>     if (oldBuffer != null) {
>       int oldByteCount = oldBuffer.remaining();
>       inputBuffer.put(oldBuffer);
>       inputBuffer.position(oldByteCount);
>       getBufferPool().releaseReceiveBuffer(oldBuffer);
>     }
>   } else {
>     if (inputBuffer.position() != 0) {
>       inputBuffer.compact();
>     } else {
>       inputBuffer.position(inputBuffer.limit());
>       inputBuffer.limit(inputBuffer.capacity());
>     }
>   }
> } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> It's not sufficient to simply call {{flip()}} on the inputBuffer before 
> returning it (I tried it and it didn't fix the bug). More work is needed.
> h2. Resolution
> When this ticket is complete the bug will be fixed and 
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these 
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)]  this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're 
> at it (the test currently holds it at true).
> The attached patch file GEODE-9825-demo.patch shows a quick hack to 
> {{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to