Alexander Dejanovski created CASSANDRA-16406:
------------------------------------------------
Summary: Debug logging affects repair performance
Key: CASSANDRA-16406
URL: https://issues.apache.org/jira/browse/CASSANDRA-16406
Project: Cassandra
Issue Type: Bug
Components: Consistency/Repair
Reporter: Alexander Dejanovski
Assignee: Alexander Dejanovski
Attachments: with_debug_logging.png, without_debug_logging.png
While working on the repair quality testing in CASSANDRA-16245, it appeared
that the node coordinating repairs on a 20GB per node dataset was generating
more than 2G of log with a total duration for the incremental repair scenarios
of ~2h40m:
[https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/37/workflows/6a7a41c8-0fca-4080-b37e-3b38998b3fab/jobs/49/steps
]
!with_debug_logging.png!
The logs showed a lot of messages from the MerkleTree class at high pace:
{noformat}
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10)
Fully inconsistent range [#<TreeRange
(-6738651767434294419,-6738622715859497972] depth=11>, #<TreeRange
(-6738622715859497972,-6738593664284701525] depth=11>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (9)
Inconsistent digest on right sub-range #<TreeRange
(-6738593664284701525,-6738535561135108630] depth=10>: [#<OffHeapInner
-6738564612709905078
hash=[b8efd3d684474276f316b1bc9f23b463cda4f8d620a4b5cf4d2c7dbb101bbe85]
children=[#<OffHeapLeaf
[fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>
#<OffHeapLeaf
[471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>]>,
#<OffHeapInner -6738564612709905078
hash=[95334327a0b50b7d3c51d6588a5f3d57701e0b978f6ab28e85cda3cb5a094eb5]
children=[#<OffHeapLeaf
[0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>
#<OffHeapLeaf
[981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:262 - (10)
Hashing sub-ranges [#<TreeRange (-6738593664284701525,-6738564612709905078]
depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>]
for #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10> divided
by midpoint -6738564612709905078
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:272 - (10)
Inconsistent digest on left sub-range #<TreeRange
(-6738593664284701525,-6738564612709905078] depth=11>: [#<OffHeapLeaf
[fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>,
#<OffHeapLeaf
[0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (10)
Inconsistent digest on right sub-range #<TreeRange
(-6738564612709905078,-6738535561135108630] depth=11>: [#<OffHeapLeaf
[471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>,
#<OffHeapLeaf
[981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10)
Fully inconsistent range [#<TreeRange
(-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange
(-6738564612709905078,-6738535561135108630] depth=11>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (9) Fully
inconsistent range [#<TreeRange (-6738651767434294419,-6738593664284701525]
depth=10>, #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (8)
Inconsistent digest on right sub-range #<TreeRange
(-6738535561135108630,-6738419354835922841] depth=9>: [#<OffHeapInner
-6738477457985515736
hash=[806ede1a35783bf5fafe8b8ccefe4d3ff48e8f0e1314f8a9ce4b23f13fed4bf9]
children=[#<OffHeapInner -6738506509560312183
hash=[e6d133afbd8041266f8a1cfe456ff07c9e7debe8ff54279b579b252f2af78b6b]
children=[#]> #<OffHeapInner -6738448406410719289
hash=[66bfedb588f87ad3957497728b91bd436af364e6ec40df3299d006de151ac092]
children=[#]>]>, #<OffHeapInner -6738477457985515736
hash=[bf128431ddf7ad72417ce853f90abd0bc7a2a38088d5fbec864c02c516ed6458]
children=[#<OffHeapInner -6738506509560312183
hash=[12a71b62ceb096369c4797f56edd91a16dd9803348b4f4cfe1454027d7dae3d2]
children=[#]> #<OffHeapInner -6738448406410719289
hash=[adb59f5313473b44dd3b7fa697d72caaaa7b23b3c0610f23670942e2c137878a]
children=[#]>]>]{noformat}
When disabling debug logging, the duration dropped to ~2h05m with decent log
sizes:
[https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/38/workflows/c91e6ceb-438b-4f00-b38a-d670f9afb4c3/jobs/51]
!without_debug_logging.png!
There's apparently too much logging for each inconsistency found in the Merkle
tree comparisons and we should move this to TRACE level if we still want to
allow debug logging to be turned on by default.
I'll prepare a patch for the MerkleTree class and run the repair testing
scenarios again to verify their duration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]