[
https://issues.apache.org/jira/browse/GEODE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102214#comment-16102214
]
ASF GitHub Bot commented on GEODE-3286:
---------------------------------------
Github user WireBaron commented on a diff in the pull request:
https://github.com/apache/geode/pull/657#discussion_r129684574
--- Diff:
geode-core/src/main/java/org/apache/geode/internal/tcp/ConnectionTable.java ---
@@ -279,26 +280,29 @@ protected void acceptConnection(Socket sock) throws
IOException, ConnectionExcep
// in our caller.
// no need to log error here since caller will log warning
- if (conn != null && !finishedConnecting) {
+ if (connection != null && !finishedConnecting) {
// we must be throwing from checkCancelInProgress so close the
connection
-
closeCon(LocalizedStrings.ConnectionTable_CANCEL_AFTER_ACCEPT.toLocalizedString(),
conn);
- conn = null;
+
closeCon(LocalizedStrings.ConnectionTable_CANCEL_AFTER_ACCEPT.toLocalizedString(),
+ connection);
+ connection = null;
}
}
- if (conn != null) {
+ if (connection != null) {
synchronized (this.receivers) {
- this.owner.stats.incReceivers();
+ this.owner.getStats().incReceivers();
if (this.closed) {
closeCon(LocalizedStrings.ConnectionTable_CONNECTION_TABLE_NO_LONGER_IN_USE
- .toLocalizedString(), conn);
+ .toLocalizedString(), connection);
return;
}
- this.receivers.add(conn);
+ if (!connection.isSocketClosed()) {
--- End diff --
The connection removing code is actually spread all over and is done
different ways depending on how and when the connection got closed. Tracking
all of that down is actually where we've spent the bulk of the four days we
were working on this. The only place we found that was obviously wrong was
this section of code, which wasn't properly dealing with connections that
closed immediately in another thread.
> Failing to cleanup connections from ConnectionTable receiver table
> ------------------------------------------------------------------
>
> Key: GEODE-3286
> URL: https://issues.apache.org/jira/browse/GEODE-3286
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Brian Rowe
>
> This bug tracks gemfire issue 1554
> (https://jira-pivotal.atlassian.net/browse/GEM-1544).
> Hello team,
> A customer (VMWare) is experiencing several {{OutOfMemoryError}} on
> production servers, and they believe there's a memory leak within GemFire.
> Apparently 9.5GB of the heap heap is occupied by 487,828 instances of
> {{sun.security.ssl.SSLSocketImpl}}, and 7.7GB of the heap is occupied by
> 487,804 instances of {{sun.security.ssl.AppOutputStream}}, both referenced
> from the {{receivers}} attribute within the {{ConnectionTable}} class. I got
> this information from the Eclipse Memory Analyzer plugin, the images are
> attached.
> Below are some OQLs that I was able to run within the plugin, it is weird
> that the collection of receivers is composed of 486.368 elements...
> {code}
> SELECT * FROM com.gemstone.gemfire.internal.tcp.ConnectionTable
> -> 1
> SELECT receivers.size FROM com.gemstone.gemfire.internal.tcp.ConnectionTable
> -> 486.368
> SELECT * FROM com.gemstone.gemfire.internal.tcp.Connection
> -> 487.758
> SELECT * FROM com.gemstone.gemfire.internal.tcp.Connection con WHERE
> con.stopped = true
> -> 486.461
> SELECT * FROM com.gemstone.gemfire.internal.tcp.Connection con WHERE
> con.stopped = false
> -> 1297
> {code}
> That said, nothing in the statistics (maybe there's something, but I can't
> find it...) seems to point to a spike in the amount of entries within the
> regions, neither in the current amount of connections, nor anything to be
> able to explain the continuous drop of the available heap over time
> (chart#freeMemory).
> The heap dump (approximately 20GB) and the statistics (don't have logs yet,
> but they might not be required by looking at the heap and the statistics)
> have been uploaded to [Google
> Drive|https://drive.google.com/drive/folders/0BxDMZZTfEL4WUFZjbjhLMXptbEk?usp=sharing].
> Just for the record, apparently we delivered 8.2.0.6 to them a year and half
> ago as a fix to [GEM-94|https://jira-pivotal.atlassian.net/browse/GEM-94] /
> [GEODE-332|https://issues.apache.org/jira/browse/GEODE-332], they've been
> running fine since then, until now. The last change in the
> {{ConnectionTable}} was done to fix these issues, so if there's actually a
> bug within the class, it will also exist on 8.2.5 (just a reminder to change
> the affected version field if required).
> The issue is not reproducible at will but happens in several of their
> environments, yet I haven't been able to reproduce it in my lab environment
> for now.
> Please let me know if you need anything else to proceed.
> Best regards.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)