[
https://issues.apache.org/jira/browse/SPARK-18919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakub Liska closed SPARK-18919.
-------------------------------
Resolution: Not A Problem
Ahh, my fault, the OpenHashSet is rehasing at 734004 and doubles the size of
the array ...
> PrimitiveKeyOpenHashMap is boxing values
> ----------------------------------------
>
> Key: SPARK-18919
> URL: https://issues.apache.org/jira/browse/SPARK-18919
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.2
> Environment: ubuntu 16.04, scala 2.11.7, 2.12.1, java 1.8.0_111
> Reporter: Jakub Liska
> Priority: Critical
>
> Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory
> footprint and I noticed, that the footprint is higher then it should, if I
> add 1M [Long,Long] entries to it, it has :
> ~ 34 MB in total
> ~ 17 MB at OpenHashSet as keys
> ~ 17 MB at Array as values
> The Array size is strange though, because its initial size is 1 048 576
> (_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value
> has 8 bytes. Therefore I think that the values are getting boxed in this
> collection.
> The consequence of this problem is that if you put more than 100M Long
> entries to this map, the GC gets choked to death with unlimited heap size ...
> Strange thing is that I get the same results with using @miniboxed instead of
> @specialized
> This is the scalameter code I used :
> {code}
> class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {
> override def measurer = new Executor.Measurer.MemoryFootprint
> val sizes = Gen.single("size")(1*1000*1000)
> performance of "MemoryFootprint" in {
> performance of "PrimitiveKeyOpenHashMap" in {
> using(sizes) config (
> exec.benchRuns -> 1,
> exec.maxWarmupRuns -> 0,
> exec.independentSamples -> 1,
> exec.requireGC -> true,
> exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m",
> "-XX:+UseG1GC")
> ) in { size =>
> val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
> var index = 0L
> while (index < size) {
> map(index) = 0L
> index+=1
> }
> println("Size " + SizeEstimator.estimate(map))
> while (index != 0) {
> index-=1
> assert(map.contains(index))
> }
> map
> }
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]