[ 
https://issues.apache.org/jira/browse/SPARK-18919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Liska closed SPARK-18919.
-------------------------------
    Resolution: Not A Problem

Ahh, my fault, the OpenHashSet is rehasing at 734004 and doubles the size of 
the array ... 

> PrimitiveKeyOpenHashMap is boxing values
> ----------------------------------------
>
>                 Key: SPARK-18919
>                 URL: https://issues.apache.org/jira/browse/SPARK-18919
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: ubuntu 16.04, scala 2.11.7, 2.12.1, java 1.8.0_111
>            Reporter: Jakub Liska
>            Priority: Critical
>
> Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory 
> footprint and I noticed, that the footprint is higher then it should, if I 
> add 1M [Long,Long] entries to it, it has : 
> ~ 34 MB in total
> ~ 17 MB  at OpenHashSet as keys
> ~ 17 MB at Array as values
> The Array size is strange though, because its initial size is 1 048 576 
> (_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value 
> has 8 bytes. Therefore I think that the values are getting boxed in this 
> collection.
> The consequence of this problem is that if you put more than 100M Long 
> entries to this map, the GC gets choked to death with unlimited heap size ...
> Strange thing is that I get the same results with using @miniboxed instead of 
> @specialized 
> This is the scalameter code I used :
> {code}
> class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {
>   override def measurer = new Executor.Measurer.MemoryFootprint
>   val sizes = Gen.single("size")(1*1000*1000)
>   performance of "MemoryFootprint" in {
>     performance of "PrimitiveKeyOpenHashMap" in {
>       using(sizes) config (
>         exec.benchRuns -> 1,
>         exec.maxWarmupRuns -> 0,
>         exec.independentSamples -> 1,
>         exec.requireGC -> true,
>         exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m", 
> "-XX:+UseG1GC")
>       ) in { size =>
>           val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
>           var index = 0L
>           while (index < size) {
>             map(index) = 0L
>             index+=1
>           }
>           println("Size " + SizeEstimator.estimate(map))
>           while (index != 0) {
>             index-=1
>             assert(map.contains(index))
>           }
>         map
>       }
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to