Jakub Liska created SPARK-18919:
-----------------------------------

             Summary: PrimitiveKeyOpenHashMap is boxing values
                 Key: SPARK-18919
                 URL: https://issues.apache.org/jira/browse/SPARK-18919
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
         Environment: ubuntu 16.04, scala 2.11.7, 2.12.1, java 1.8.0_111
            Reporter: Jakub Liska
            Priority: Critical


Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory footprint 
and I noticed, that the footprint is higher then it should, if I add 1M 
[Long,Long] entries to it, it has : 
~ 34 MB in total
~ 17 MB  at OpenHashSet as keys
~ 17 MB at Array as values

The Array size is strange though, because its initial size is 1 048 576 
(_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value has 
8 bytes. Therefore I think that the values are getting boxed in this collection.

The consequence of this problem is that if you put more than 100M Long entries 
to this map, the GC gets choked to death...

This is the scalameter code I used :
{code}
class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {

  override def measurer = new Executor.Measurer.MemoryFootprint

  val sizes = Gen.single("size")(1*1000*1000)

  performance of "MemoryFootprint" in {
    performance of "PrimitiveKeyOpenHashMap" in {
      using(sizes) config (
        exec.benchRuns -> 1,
        exec.maxWarmupRuns -> 0,
        exec.independentSamples -> 1,
        exec.requireGC -> true,
        exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m", 
"-XX:+UseG1GC")
      ) in { size =>
          val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
          var index = 0L
          while (index < size) {
            map(index) = 0L
            index+=1
          }
          println("Size " + SizeEstimator.estimate(map))
          while (index != 0) {
            index-=1
            assert(map.contains(index))
          }
        map
      }
    }
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to