Jakub Liska created SPARK-18919:
-----------------------------------
Summary: PrimitiveKeyOpenHashMap is boxing values
Key: SPARK-18919
URL: https://issues.apache.org/jira/browse/SPARK-18919
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.2
Environment: ubuntu 16.04, scala 2.11.7, 2.12.1, java 1.8.0_111
Reporter: Jakub Liska
Priority: Critical
Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory footprint
and I noticed, that the footprint is higher then it should, if I add 1M
[Long,Long] entries to it, it has :
~ 34 MB in total
~ 17 MB at OpenHashSet as keys
~ 17 MB at Array as values
The Array size is strange though, because its initial size is 1 048 576
(_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value has
8 bytes. Therefore I think that the values are getting boxed in this collection.
The consequence of this problem is that if you put more than 100M Long entries
to this map, the GC gets choked to death...
This is the scalameter code I used :
{code}
class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {
override def measurer = new Executor.Measurer.MemoryFootprint
val sizes = Gen.single("size")(1*1000*1000)
performance of "MemoryFootprint" in {
performance of "PrimitiveKeyOpenHashMap" in {
using(sizes) config (
exec.benchRuns -> 1,
exec.maxWarmupRuns -> 0,
exec.independentSamples -> 1,
exec.requireGC -> true,
exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m",
"-XX:+UseG1GC")
) in { size =>
val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
var index = 0L
while (index < size) {
map(index) = 0L
index+=1
}
println("Size " + SizeEstimator.estimate(map))
while (index != 0) {
index-=1
assert(map.contains(index))
}
map
}
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]