[ 
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448309#comment-17448309
 ] 

Bill Burcham commented on GEODE-9825:
-------------------------------------

Merged to {{{}develop{}}}. Back-port PR to 1.14 is ready to merge. A flaky test 
failed in the PR for 1.13 (wrote a new ticket GEODE-9850 and re-initiated the 
test).

Back-port to 1.12 has a problem. I had to back-port the PR for GEODE-9713 (test 
framework enhancement). Unfortunately it relies on a newer version (4.1.0) of 
Awaitility (was at 3.1.6). Bumping just that version in 
DependencyConstraints.groovy was not sufficient as something (TBD) is dependent 
on Awaitility 2.0.0 and that version is taking precedence. I want to work this 
out and then merge all three PRs together (in close succession).

> Disparate socket-buffer-size Results in "IOException: Unknown header byte" 
> and Hangs
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-9825
>                 URL: https://issues.apache.org/jira/browse/GEODE-9825
>             Project: Geode
>          Issue Type: Bug
>          Components: messaging
>    Affects Versions: 1.12.4, 1.15.0
>            Reporter: Bill Burcham
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: pull-request-available
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
> byte..."}} and hangs if members are configured with different 
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug turn off TLS and set socket-buffer-size on sender to be 
> 64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for 
> an example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
> messages it can from the current input buffer, it then considers whether the 
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to 
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave 
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be 
> _read_ not written.
> Before the changes for GEODE-9141 were introduced, the line of code 
> referenced above used to be this snippet in 
> {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has 
> since been removed):
> {code:java}
>      // need a bigger buffer
>     logger.info("Allocating larger network read buffer, new size is {} old 
> size was {}.",
>         allocSize, oldBufferSize);
>     ByteBuffer oldBuffer = inputBuffer;
>     inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
>     if (oldBuffer != null) {
>       int oldByteCount = oldBuffer.remaining();
>       inputBuffer.put(oldBuffer);
>       inputBuffer.position(oldByteCount);
>       getBufferPool().releaseReceiveBuffer(oldBuffer);
>     } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> But the code inside 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing 
> something like:
> {code:java}
> newBuffer.clear();
> newBuffer.put(existing);
> newBuffer.flip();
> releaseBuffer(type, existing);
> return newBuffer; {code}
> A solution (shown in the associated PR) is to do add logic after the call to 
> {{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ 
> state:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize);
> // we're returning to the caller (done == true) so make buffer writeable
> inputBuffer.position(inputBuffer.limit());
> inputBuffer.limit(inputBuffer.capacity()); {code}
> h2. Resolution
> When this ticket is complete the bug will be fixed and 
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these 
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)]  this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're 
> at it (the test currently holds it at true).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to