Wan Kun created SPARK-39325:
-------------------------------
Summary: Improve MapOutputTracker convertMapStatuses performance
Key: SPARK-39325
URL: https://issues.apache.org/jira/browse/SPARK-39325
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.4.0
Reporter: Wan Kun
How to reproduce this issue:
{code:java}
val benchmark = new Benchmark("MapStatuses Convert", 1, output = output)
val blockManagerNumber = 1000
val mapNumber = 50000
val shufflePartitions = 10000
val shuffleId: Int = 0
// First reduce task will fetch map data from startPartition to endPartition
val startPartition = 0
val startMapIndex = 0
val endMapIndex = mapNumber
val blockManagers = Array.tabulate(blockManagerNumber) { i =>
BlockManagerId("a", "host" + i, 7337)
}
val mapStatuses: Array[MapStatus] = Array.tabulate(mapNumber) { mapTaskId =>
HighlyCompressedMapStatus(
blockManagers(mapTaskId % blockManagerNumber),
Array.tabulate(shufflePartitions)(i => if (i % 50 == 0) 1 else 0),
mapTaskId)
}
val bitmap = new RoaringBitmap()
Range(0, 4000).foreach(bitmap.add(_))
val mergeStatuses = Array.tabulate(shufflePartitions) { part =>
MergeStatus(blockManagers(part % blockManagerNumber), shuffleId, bitmap, 100)
}
Array(499, 999, 1499).foreach { endPartition =>
benchmark.addCase(
s"Num Maps: $mapNumber Fetch partitions:${endPartition - startPartition +
1}",
numIters) { _ =>
MapOutputTracker.convertMapStatuses(
shuffleId,
startPartition,
endPartition,
mapStatuses,
startMapIndex,
endMapIndex,
Some(mergeStatuses))
}
}
benchmark.run() {code}
Benchmark result
{code:java}
================================================================================================
MapStatuses Convert Benchmark
================================================================================================Java
HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 Intel(R)
Core(TM) i9-9980HK CPU @ 2.40GHz MapStatuses Convert: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Num Maps: 50000 Fetch partitions:500 3393 3483
96 0.0 3393439257.0 1.0X Num Maps: 50000 Fetch partitions:1000
6640 6772 121 0.0 6639654832.0
0.5X Num Maps: 50000 Fetch partitions:1500 10035 10143
108 0.0 10035100069.0 0.3X
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]