lbradstreet commented on a change in pull request #8467:
URL: https://github.com/apache/kafka/pull/8467#discussion_r414592818



##########
File path: core/src/main/scala/kafka/log/LogManager.scala
##########
@@ -1003,9 +1003,17 @@ class LogManager(logDirs: Seq[File],
   /**
    * Map of log dir to logs by topic and partitions in that dir
    */
-  private def logsByDir: Map[String, Map[TopicPartition, Log]] = {
-    (this.currentLogs.toList ++ this.futureLogs.toList).toMap
-      .groupBy { case (_, log) => log.parentDir }
+  def logsByDir: Map[String, Map[TopicPartition, Log]] = {
+    // This code is called often by checkpoint processes and is written in a 
way that reduces
+    // allocations and CPU with many topic partitions.
+    // When changing this code please measure the changes with 
org.apache.kafka.jmh.server.CheckpointBench
+    val byDir = new mutable.AnyRefMap[String, 
mutable.AnyRefMap[TopicPartition, Log]]()
+    def addToDir(tp: TopicPartition, log: Log): Unit = {
+      byDir.getOrElseUpdate(log.parentDir, new 
mutable.AnyRefMap[TopicPartition, Log]()).put(tp, log)
+    }
+    currentLogs.foreach { case (tp, log) => addToDir(tp, log) }
+    futureLogs.foreach { case (tp, log) => addToDir(tp, log) }

Review comment:
       @ijuma I had mistakenly thought it was taking a tuple there, and maybe 
that was just the way Scala converted a lambda to a BiConsumer, and the results 
had regressed back to close to what I was seeing with the iterator version.
   
   When I ran it, it returned saw:
   ```
   1140074.937 ± 75151.914    B/op
   def foreachEntry(f: (K, V) => Unit): Unit = {
       pool.forEach((k, v) => f(k, v))
     }
   
   1044120.722 ±  1676.137    B/op (included in an above comment)   
   pool.forEach(new BiConsumer[K,V] {
         override def accept(t: K, u: V): Unit = f(t, u)
       })
   ```
   
   Strangely I have just re-run the BiConsumer version and it returned 
`1188097.413 ±  1639.537    B/op`. I'm not sure why it's regressed from what I 
saw on a previous run. I am OK with using the version with a lambda if you are. 
I'm not sure I will have time to investigate it further, and it's still a good 
improvement compared to the version that used `foreach`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to