Alexander Dejanovski created CASSANDRA-16406:
------------------------------------------------

             Summary: Debug logging affects repair performance
                 Key: CASSANDRA-16406
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16406
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Repair
            Reporter: Alexander Dejanovski
            Assignee: Alexander Dejanovski
         Attachments: with_debug_logging.png, without_debug_logging.png

While working on the repair quality testing in CASSANDRA-16245, it appeared 
that the node coordinating repairs on a 20GB per node dataset was generating 
more than 2G of log with a total duration for the incremental repair scenarios 
of ~2h40m: 
[https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/37/workflows/6a7a41c8-0fca-4080-b37e-3b38998b3fab/jobs/49/steps
]
!with_debug_logging.png!  
The logs showed a lot of messages from the MerkleTree class at high pace:
{noformat}
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) 
Fully inconsistent range [#<TreeRange 
(-6738651767434294419,-6738622715859497972] depth=11>, #<TreeRange 
(-6738622715859497972,-6738593664284701525] depth=11>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (9) 
Inconsistent digest on right sub-range #<TreeRange 
(-6738593664284701525,-6738535561135108630] depth=10>: [#<OffHeapInner 
-6738564612709905078 
hash=[b8efd3d684474276f316b1bc9f23b463cda4f8d620a4b5cf4d2c7dbb101bbe85] 
children=[#<OffHeapLeaf 
[fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]> 
#<OffHeapLeaf 
[471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>]>, 
#<OffHeapInner -6738564612709905078 
hash=[95334327a0b50b7d3c51d6588a5f3d57701e0b978f6ab28e85cda3cb5a094eb5] 
children=[#<OffHeapLeaf 
[0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]> 
#<OffHeapLeaf 
[981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:262 - (10) 
Hashing sub-ranges [#<TreeRange (-6738593664284701525,-6738564612709905078] 
depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>] 
for #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10> divided 
by midpoint -6738564612709905078
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:272 - (10) 
Inconsistent digest on left sub-range #<TreeRange 
(-6738593664284701525,-6738564612709905078] depth=11>: [#<OffHeapLeaf 
[fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>, 
#<OffHeapLeaf 
[0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (10) 
Inconsistent digest on right sub-range #<TreeRange 
(-6738564612709905078,-6738535561135108630] depth=11>: [#<OffHeapLeaf 
[471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>, 
#<OffHeapLeaf 
[981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) 
Fully inconsistent range [#<TreeRange 
(-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange 
(-6738564612709905078,-6738535561135108630] depth=11>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (9) Fully 
inconsistent range [#<TreeRange (-6738651767434294419,-6738593664284701525] 
depth=10>, #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>]
DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (8) 
Inconsistent digest on right sub-range #<TreeRange 
(-6738535561135108630,-6738419354835922841] depth=9>: [#<OffHeapInner 
-6738477457985515736 
hash=[806ede1a35783bf5fafe8b8ccefe4d3ff48e8f0e1314f8a9ce4b23f13fed4bf9] 
children=[#<OffHeapInner -6738506509560312183 
hash=[e6d133afbd8041266f8a1cfe456ff07c9e7debe8ff54279b579b252f2af78b6b] 
children=[#]> #<OffHeapInner -6738448406410719289 
hash=[66bfedb588f87ad3957497728b91bd436af364e6ec40df3299d006de151ac092] 
children=[#]>]>, #<OffHeapInner -6738477457985515736 
hash=[bf128431ddf7ad72417ce853f90abd0bc7a2a38088d5fbec864c02c516ed6458] 
children=[#<OffHeapInner -6738506509560312183 
hash=[12a71b62ceb096369c4797f56edd91a16dd9803348b4f4cfe1454027d7dae3d2] 
children=[#]> #<OffHeapInner -6738448406410719289 
hash=[adb59f5313473b44dd3b7fa697d72caaaa7b23b3c0610f23670942e2c137878a] 
children=[#]>]>]{noformat}
When disabling debug logging, the duration dropped to ~2h05m with decent log 
sizes:

[https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/38/workflows/c91e6ceb-438b-4f00-b38a-d670f9afb4c3/jobs/51]

!without_debug_logging.png!

There's apparently too much logging for each inconsistency found in the Merkle 
tree comparisons and we should move this to TRACE level if we still want to 
allow debug logging to be turned on by default.

I'll prepare a patch for the MerkleTree class and run the repair testing 
scenarios again to verify their duration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to