Timothee Maret created SLING-10133:
--------------------------------------
Summary: Memory leak in MonitoringDistributionPackageBuilder
Key: SLING-10133
URL: https://issues.apache.org/jira/browse/SLING-10133
Project: Sling
Issue Type: Bug
Reporter: Timothee Maret
The MonitoringDistributionPackageBuilder maintain a list of MBean for the
latest packages. The number of packages to be monitored is passed as the
[queueCapacity|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/b80cd8f3bae6b7875387ee7caaea271b7e9baec6/src/main/java/org/apache/sling/distribution/monitor/impl/MonitoringDistributionPackageBuilder.java#L49]
via the constructor. When the queueCapacity is 0, the monitoring is disabled.
[VaultDistributionPackageBuilderFactory|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/b80cd8f3bae6b7875387ee7caaea271b7e9baec6/src/main/java/org/apache/sling/distribution/serialization/impl/vlt/VaultDistributionPackageBuilderFactory.java#L201]
and
[DistributionPackageBuilderFactory|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/b80cd8f3bae6b7875387ee7caaea271b7e9baec6/src/main/java/org/apache/sling/distribution/serialization/impl/DistributionPackageBuilderFactory.java]
disable this feature by default. An environment that runs for multiple weeks
without restart and with the default configuration will experience a memory
leak that leads to the JVM running out of memory.
The implementation has two flaws that explain the memory leak.
h2. #1 - Registering a MBean when the queueCapacity is 0
The code [unconditionally registers a
MBean|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/b80cd8f3bae6b7875387ee7caaea271b7e9baec6/src/main/java/org/apache/sling/distribution/monitor/impl/MonitoringDistributionPackageBuilder.java#L106]
even if the queueCapacity is 0. We need to only register a MBean when the
capacity is > 0.
h2. #2 - Concurrency issue when un-registering MBean
The code [attempts to
remove|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/b80cd8f3bae6b7875387ee7caaea271b7e9baec6/src/main/java/org/apache/sling/distribution/monitor/impl/MonitoringDistributionPackageBuilder.java#L108]
by checking if the queueCapacity equals the list of MBeans. This check works
in a single threaded context but it falls short when
registerDistributionPackageMBean is invoked concurrently. In the latter case,
it can happen that the check never holds true leading the mBeans queue to grow
indefinitely. One solution is to leverage the features of the
LinkedBlockingDeque. Create a LinkedBlockingDeque with bounded capacity and
rely on the returned status from the offer method to decide if an item needs to
be removed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)