[
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446277#comment-17446277
]
Darrel Schneider commented on GEODE-9825:
-----------------------------------------
This issue can have many different manifestations but you can grep you locator
and server logs for "Allocating larger network read buffer". If you see that
info message then you are at risk of having some type of inconsistency in your
geode cluster.
The workaround for this issue is to set "socket-buffer-size" the same on all
the members of your cluster (both servers and locators).
> Disparate socket-buffer-size Results in "IOException: Unknown header byte"
> and Hangs
> ------------------------------------------------------------------------------------
>
> Key: GEODE-9825
> URL: https://issues.apache.org/jira/browse/GEODE-9825
> Project: Geode
> Issue Type: Bug
> Components: messaging
> Affects Versions: 1.12.4, 1.15.0
> Reporter: Bill Burcham
> Priority: Major
> Attachments: GEODE-9825-demo.patch
>
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header
> byte..."}} and hangs if members are configured with different
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug, modify {{P2PMessagingConcurrencyDUnitTest}} so that
> sender and locator and receiver use different configuration parameters. Set
> {{socket-buffer-size}} to 212992 for the sender and 32 * 1024 for the
> receiver. Oh and just skip the call to {{{}securityProperties(){}}}—we want
> to induce the "Unknown header byte" exception—we don't want the TLS framework
> throwing exceptions. See attached patch file GEODE-9825-demo.patch for an
> example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the
> messages it can from the current input buffer, it then considers whether the
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be
> _read_ not written.
> Before the changes for GEODE-9141 were introduced, the line of code
> referenced above used to be this snippet in
> {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has
> since been removed):
> {code:java}
> // need a bigger buffer
> logger.info("Allocating larger network read buffer, new size is {} old
> size was {}.",
> allocSize, oldBufferSize);
> ByteBuffer oldBuffer = inputBuffer;
> inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);
> if (oldBuffer != null) {
> int oldByteCount = oldBuffer.remaining();
> inputBuffer.put(oldBuffer);
> inputBuffer.position(oldByteCount);
> getBufferPool().releaseReceiveBuffer(oldBuffer);
> } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> But the code inside
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing
> something like:
> {code:java}
> newBuffer.clear();
> newBuffer.put(existing);
> newBuffer.flip();
> releaseBuffer(type, existing);
> return newBuffer; {code}
> It's not clear to me, exactly, what the difference is between the old and new
> code. It's not sufficient to simply call {{flip()}} on the inputBuffer before
> returning it (I tried it and it didn't fix the bug). More work is needed.
> h2. Resolution
> When this ticket is complete the bug will be fixed and
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test these
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)] this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're
> at it (the test currently holds it at true).
> The attached patch file GEODE-9825-demo.patch shows a quick hack to
> {{P2PMessagingConcurrencyDUnitTest}} to illustrate the bug.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)