[ 
https://issues.apache.org/jira/browse/GEODE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557541#comment-17557541
 ] 

ASF subversion and git services commented on GEODE-10323:
---------------------------------------------------------

Commit 8b85b2e07d2da39e82bd9ca4817b44bc24d37876 in geode's branch 
refs/heads/develop from Alberto Gomez
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8b85b2e07d ]

GEODE-10323: Remove schedule threads in MemoryAllocatorImpl constructor (#7715)

* GEODE-10323: Remove schedule threads in MemoryAllocatorImpl

The scheduled executor used in MemoryAllocatorImpl
was scheduled in the constructor. This provoked
intermittent failures in OffHeapStorageJUnitTest testCreateOffHeapStorage
test cases due to a race condition.

The scheduling has been moved to a new method (start())
in the MemoryAllocatorImpl class that is in turn
invoked in the create() static method.

* GEODE-10323: Extract update stats code to new class

> OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: 
> expected:<100> but was:<1048576>
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-10323
>                 URL: https://issues.apache.org/jira/browse/GEODE-10323
>             Project: Geode
>          Issue Type: Bug
>          Components: offheap
>    Affects Versions: 1.16.0
>            Reporter: Kirk Lund
>            Assignee: Alberto Gomez
>            Priority: Major
>              Labels: pull-request-available
>
> [Please see 1st comment on this ticket below the description for 
> recommendation on how to fix.]
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing 
> intermittently due to commit a350ed2 which went in as 
> "[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance 
> off-heap fragmentation visibility. 
> ([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])".
> The failure stack looks like:
> {noformat}
> OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED
>     java.lang.AssertionError: expected:<100> but was:<1048576>
>         at org.junit.Assert.fail(Assert.java:89)
>         at org.junit.Assert.failNotEquals(Assert.java:835)
>         at org.junit.Assert.assertEquals(Assert.java:647)
>         at org.junit.Assert.assertEquals(Assert.java:633)
>         at 
> org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220)
> {noformat}
> The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor 
> {{updateNonRealTimeStatsExecutor}} was added near the bottom of the 
> constructor which immediately gets scheduled to invoke 
> {{freeList::updateNonRealTimeStats}} repeatedly at the specified frequency of 
> {{updateOffHeapStatsFrequencyMs}} whenever an instance of 
> {{MemoryAllocatorImpl}} is created.
> {{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and 
> {{setFreedChunks}}.
> Scheduling or starting any sort of threads from a constructor is considered a 
> bad practice and should only be done via some sort of activation method such 
> as {{start()}} which can be invoked from the static factory method for 
> {{MemoryAllocatorImpl}}. The unit tests would then continue to construct 
> instances via a constructor that does NOT invoke {{start()}}.
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other 
> tests, is written with the assumption that no other thread or component can 
> or will be updating the {{OffHeapStats}}. Hence it is then safe for the test 
> to setLargestFragment and then assert that it has the value passed in:
> {noformat}
> 219:      stats.setLargestFragment(100);
> 220:      assertEquals(100, stats.getLargestFragment());
> {noformat}
> Line 220 is the source of the assertion failure because 
> {{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and 
> changes the value of {{setLargestFragment}} immediately after the test sets 
> the value to 100, but before the test invokes the assertion.
> Looking back at the [PR test 
> results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can 
> see that stress-new-test failed 5 times with this exact failure before being 
> merged to develop:
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/
>  (x2)
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to