[ 
https://issues.apache.org/jira/browse/CASSANDRA-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441464#comment-17441464
 ] 

Caleb Rackliffe edited comment on CASSANDRA-17039 at 11/11/21, 5:49 PM:
------------------------------------------------------------------------

The memory meter reports completely different numbers between J8 and J11 for 
the retained size of {{MeasureableRepairSession}} across the same exact C* 
codebase:

J8
{noformat}
INFO  [main] 2021-11-09 20:13:02,767 RepairJob.java:268 - Created 2 sync tasks 
based on 3 merkle tree responses for 6e524549-dbfd-4143-83b6-8a9920594f9e 
(took: 13ms)
INFO  [RepairJobTask:1] 2021-11-09 20:13:02,876 SyncTask.java:89 - [repair 
#bba3e650-41cb-11ec-86cb-e76f756cd20a] Endpoints /127.0.0.1:7012 and 
/127.0.0.2:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:02,876 SyncTask.java:89 - [repair 
#bba3e650-41cb-11ec-86cb-e76f756cd20a] Endpoints /127.0.0.1:7012 and 
/127.0.0.3:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:02,878 SymmetricRemoteSyncTask.java:68 
- [repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.3:7012)
INFO  [RepairJobTask:1] 2021-11-09 20:13:02,878 SymmetricRemoteSyncTask.java:68 
- [repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.2:7012)
ERROR [main] 2021-11-09 20:13:03,123 SubstituteLogger.java:265 - Size with 
trees: 9574096
DEBUG [RepairJobTask:2] 2021-11-09 20:13:03,125 RepairSession.java:241 - 
[repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.3:7012 on Standard1
DEBUG [RepairJobTask:1] 2021-11-09 20:13:03,125 RepairSession.java:241 - 
[repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.2:7012 on Standard1
ERROR [main] 2021-11-09 20:13:03,275 SubstituteLogger.java:265 - Size without 
trees: 1863000
{noformat}

J11
{noformat}
INFO  [main] 2021-11-09 20:13:26,155 RepairJob.java:268 - Created 2 sync tasks 
based on 3 merkle tree responses for f9e90e8c-6ae6-4ca9-ba24-f40750d8b0f8 
(took: 11ms)
INFO  [RepairJobTask:1] 2021-11-09 20:13:26,204 SyncTask.java:89 - [repair 
#c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Endpoints /127.0.0.1:7012 and 
/127.0.0.2:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:26,204 SyncTask.java:89 - [repair 
#c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Endpoints /127.0.0.1:7012 and 
/127.0.0.3:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:26,205 SymmetricRemoteSyncTask.java:68 
- [repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.3:7012)
INFO  [RepairJobTask:1] 2021-11-09 20:13:26,205 SymmetricRemoteSyncTask.java:68 
- [repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.2:7012)
ERROR [main] 2021-11-09 20:13:26,641 SubstituteLogger.java:265 - Size with 
trees: 16202960
DEBUG [RepairJobTask:1] 2021-11-09 20:13:26,643 RepairSession.java:241 - 
[repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.2:7012 on Standard1
DEBUG [RepairJobTask:2] 2021-11-09 20:13:26,643 RepairSession.java:241 - 
[repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.3:7012 on Standard1
ERROR [main] 2021-11-09 20:13:48,363 SubstituteLogger.java:265 - Size without 
trees: 8447392
{noformat}

One comment from the test indicates...

{noformat}
// The session retains memory in the contained executor until the threads 
expire, so we wait for the threads
// that ran the Tree -> SyncTask conversions to die and release the memory
{noformat}

In both cases above, the executor has two worker threads in its collection of 
workers before and zero after. Jamm should only be following live references, 
so still have some digging to do. I might make the executor itself volatile and 
null it out before the second inspection...


was (Author: maedhroz):
The memory meter reports completely different numbers between J8 and J11 for 
the retained size of {{MeasureableRepairSession}} across the same exact C* 
codebase:

J8
{noformat}
INFO  [main] 2021-11-09 20:13:02,767 RepairJob.java:268 - Created 2 sync tasks 
based on 3 merkle tree responses for 6e524549-dbfd-4143-83b6-8a9920594f9e 
(took: 13ms)
INFO  [RepairJobTask:1] 2021-11-09 20:13:02,876 SyncTask.java:89 - [repair 
#bba3e650-41cb-11ec-86cb-e76f756cd20a] Endpoints /127.0.0.1:7012 and 
/127.0.0.2:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:02,876 SyncTask.java:89 - [repair 
#bba3e650-41cb-11ec-86cb-e76f756cd20a] Endpoints /127.0.0.1:7012 and 
/127.0.0.3:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:02,878 SymmetricRemoteSyncTask.java:68 
- [repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.3:7012)
INFO  [RepairJobTask:1] 2021-11-09 20:13:02,878 SymmetricRemoteSyncTask.java:68 
- [repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.2:7012)
ERROR [main] 2021-11-09 20:13:03,123 SubstituteLogger.java:265 - Size with 
trees: 9574096
DEBUG [RepairJobTask:2] 2021-11-09 20:13:03,125 RepairSession.java:241 - 
[repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.3:7012 on Standard1
DEBUG [RepairJobTask:1] 2021-11-09 20:13:03,125 RepairSession.java:241 - 
[repair #bba3e650-41cb-11ec-86cb-e76f756cd20a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.2:7012 on Standard1
ERROR [main] 2021-11-09 20:13:03,275 SubstituteLogger.java:265 - Size without 
trees: 1863000
{noformat}

J11
{noformat}
INFO  [main] 2021-11-09 20:13:26,155 RepairJob.java:268 - Created 2 sync tasks 
based on 3 merkle tree responses for f9e90e8c-6ae6-4ca9-ba24-f40750d8b0f8 
(took: 11ms)
INFO  [RepairJobTask:1] 2021-11-09 20:13:26,204 SyncTask.java:89 - [repair 
#c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Endpoints /127.0.0.1:7012 and 
/127.0.0.2:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:26,204 SyncTask.java:89 - [repair 
#c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Endpoints /127.0.0.1:7012 and 
/127.0.0.3:7012 have 1 range(s) out of sync for Standard1
INFO  [RepairJobTask:2] 2021-11-09 20:13:26,205 SymmetricRemoteSyncTask.java:68 
- [repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.3:7012)
INFO  [RepairJobTask:1] 2021-11-09 20:13:26,205 SymmetricRemoteSyncTask.java:68 
- [repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Forwarding streaming repair of 
1 ranges to /127.0.0.1:7012 (to be streamed with /127.0.0.2:7012)
ERROR [main] 2021-11-09 20:13:26,641 SubstituteLogger.java:265 - Size with 
trees: 16202960
DEBUG [RepairJobTask:1] 2021-11-09 20:13:26,643 RepairSession.java:241 - 
[repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.2:7012 on Standard1
DEBUG [RepairJobTask:2] 2021-11-09 20:13:26,643 RepairSession.java:241 - 
[repair #c9a25bb0-41cb-11ec-bba2-f9bd30f7271a] Repair completed between 
/127.0.0.1:7012 and /127.0.0.3:7012 on Standard1
ERROR [main] 2021-11-09 20:13:48,363 SubstituteLogger.java:265 - Size without 
trees: 8447392
{noformat}

It's almost as if only one of the two threads released the relevant memory...

> RepairJobTest.testNoTreesRetainedAfterDifference fails consistently on Java 11
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17039
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17039
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Brandon Williams
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>
> Sometimes fails an assertion:
> {noformat}
> Expecting:
>  <10000L>
> to be less than:
>  <10000L> 
> {noformat}
> https://app.circleci.com/pipelines/github/driftx/cassandra/269/workflows/f2b0a738-0785-4011-9ac1-071837dc9170/jobs/2049/tests#failed-test-1



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to