[ 
https://issues.apache.org/jira/browse/SPARK-49491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878686#comment-17878686
 ] 

Yuming Wang edited comment on SPARK-49491 at 9/3/24 2:46 AM:
-------------------------------------------------------------

LongMap vs HashMap
{code:scala}
    import scala.util.Random
    import org.apache.spark.benchmark.Benchmark

    val size = 4000

    val map1 = new collection.mutable.HashMap[Long, Object]()
    val map2 = new collection.mutable.LongMap[Object]()

    Range(0, size).foreach { id =>
      map1.put(id, new Object())
      map2.put(id, new Object())
    }

    val keys = Range(1, size * 20000).map { _ =>
      new Random().nextInt(size + 10)
    }

    val benchmark = new Benchmark("Benchmark Map", size, minNumIters = 30)
    benchmark.addCase("HashMap") { _ =>
      keys.foreach { key => map1.getOrElseUpdate(key, new Object()) }
    }

    benchmark.addCase("LongMap") { _ =>
      keys.foreach { key => map2.getOrElseUpdate(key, new Object()) }
    }

    benchmark.run()
{code}

2.12.18:
{noformat}
OpenJDK 64-Bit Server VM 1.8.0_382-b05 on Mac OS X 14.5
Apple M2 Max
Benchmark Map:                            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
HashMap                                            1870           1888          
29          0.0      467420.1       1.0X
LongMap                                             629            634          
 4          0.0      157347.9       3.0X
{noformat}
2.13.8:
{noformat}
OpenJDK 64-Bit Server VM 1.8.0_382-b05 on Mac OS X 14.5
Apple M2 Max
Benchmark Map:                            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
HashMap                                             735            759          
52          0.0      183763.2       1.0X
LongMap                                             570            575          
 4          0.0      142464.2       1.3X
{noformat}




was (Author: q79969786):
LongMap vs HashMap
{code:scala}
    import scala.util.Random
    import org.apache.spark.benchmark.Benchmark

    val size = 4000

    val map1 = new collection.mutable.HashMap[Long, Object]()
    val map2 = new collection.mutable.LongMap[Object]()

    Range(0, size).foreach { id =>
      map1.put(id, new Object())
      map2.put(id, new Object())
    }

    val keys = Range(1, size * 20000).map { _ =>
      new Random().nextInt(size + 10)
    }

    val benchmark = new Benchmark("Benchmark Map", size, minNumIters = 30)
    benchmark.addCase("HashMap") { _ =>
      keys.foreach { key => map1.getOrElseUpdate(key, new Object()) }
    }

    benchmark.addCase("LongMap") { _ =>
      keys.foreach { key => map2.getOrElseUpdate(key, new Object()) }
    }

    benchmark.run()
{code}


{noformat}
OpenJDK 64-Bit Server VM 1.8.0_382-b05 on Mac OS X 14.5
Apple M2 Max
Benchmark Map:                            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
HashMap                                            1870           1888          
29          0.0      467420.1       1.0X
LongMap                                             629            634          
 4          0.0      157347.9       3.0X

{noformat}


> Replace HashMap with LongMap or AnyRefMap
> -----------------------------------------
>
>                 Key: SPARK-49491
>                 URL: https://issues.apache.org/jira/browse/SPARK-49491
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> JDK 1.8:
> {noformat}
> OpenJDK 64-Bit Server VM 1.8.0_382-b05 on Mac OS X 14.5
> Apple M2 Max
> Benchmark Map:                            Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> HashMap                                            2028           2063        
>   98          0.0      506933.0       1.0X
> AnyRefMap                                          1901           1936        
>   19          0.0      475346.3       1.1X
> {noformat}
> Java 17:
> {noformat}
> OpenJDK 64-Bit Server VM 17.0.7+7-LTS on Mac OS X 14.5
> Apple M2 Max
> Benchmark Map:                            Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> HashMap                                            1575           1615        
>   47          0.0      393832.3       1.0X
> AnyRefMap                                          1495           1502        
>    5          0.0      373664.5       1.1X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to