Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/612#discussion_r9925630 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -83,6 +83,28 @@ class ExternalAppendOnlyMapSuite extends FunSuite with BeforeAndAfter with Local (3, Set[Int](30)))) } + test("insert with collision on hashCode Int.MaxValue") { + val conf = new SparkConf(false) + sc = new SparkContext("local", "test", conf) + + val map = new ExternalAppendOnlyMap[Int, Int, ArrayBuffer[Int]](createCombiner, + mergeValue, mergeCombiners) + + map.insert(Int.MaxValue, 10) + map.insert(2, 20) + map.insert(3, 30) + map.insert(Int.MaxValue, 100) + map.insert(2, 200) + map.insert(Int.MaxValue, 1000) + val it = map.iterator + assert(it.hasNext) + val result = it.toSet[(Int, ArrayBuffer[Int])].map(kv => (kv._1, kv._2.toSet)) + assert(result == Set[(Int, Set[Int])]( + (Int.MaxValue, Set[Int](10, 100, 1000)), + (2, Set[Int](20, 200)), + (3, Set[Int](30)))) --- End diff -- Even after setting the memory parameters, we still need to insert a lot into the map to induce spilling. I have been able to trigger the exception that you found with the following: (1 until 100000).foreach { i => map.insert(i, i) } map.insert(Int.MaxValue, Int.MaxValue) val it = map.iterator while (it.hasNext) { it.next() }
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---