Repository: spark
Updated Branches:
  refs/heads/branch-2.2 fa3667ece -> 9bd25c9bf


[SPARK-23508][CORE] Fix BlockmanagerId in case blockManagerIdCache cause oom

… cause oom

## What changes were proposed in this pull request?
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom

`val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()`
Since whenever we apply a new BlockManagerId, it will put into this map.

This patch will use guava cahce for  blockManagerIdCache instead.

A heap dump show in 
[SPARK-23508](https://issues.apache.org/jira/browse/SPARK-23508)

## How was this patch tested?
Exist tests.

Author: zhoukang <zhoukang199...@gmail.com>

Closes #20667 from caneGuy/zhoukang/fix-history.

(cherry picked from commit 6a8abe29ef3369b387d9bc2ee3459a6611246ab1)
Signed-off-by: Wenchen Fan <wenc...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bd25c9b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bd25c9b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bd25c9b

Branch: refs/heads/branch-2.2
Commit: 9bd25c9bf263d5a5a203feeb14a0fefde7662b0e
Parents: fa3667e
Author: zhoukang <zhoukang199...@gmail.com>
Authored: Wed Feb 28 23:16:29 2018 +0800
Committer: Wenchen Fan <wenc...@databricks.com>
Committed: Wed Feb 28 23:17:06 2018 +0800

----------------------------------------------------------------------
 .../org/apache/spark/storage/BlockManagerId.scala     | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9bd25c9b/core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala
index c37a360..a416f08 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala
@@ -18,7 +18,8 @@
 package org.apache.spark.storage
 
 import java.io.{Externalizable, IOException, ObjectInput, ObjectOutput}
-import java.util.concurrent.ConcurrentHashMap
+
+import com.google.common.cache.{CacheBuilder, CacheLoader}
 
 import org.apache.spark.SparkContext
 import org.apache.spark.annotation.DeveloperApi
@@ -132,10 +133,17 @@ private[spark] object BlockManagerId {
     getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  /**
+   * The max cache size is hardcoded to 10000, since the size of a 
BlockManagerId
+   * object is about 48B, the total memory cost should be below 1MB which is 
feasible.
+   */
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+    .maximumSize(10000)
+    .build(new CacheLoader[BlockManagerId, BlockManagerId]() {
+      override def load(id: BlockManagerId) = id
+    })
 
   def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
-    blockManagerIdCache.putIfAbsent(id, id)
     blockManagerIdCache.get(id)
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to