Repository: spark Updated Branches: refs/heads/master aac13fb48 -> 7b8415401
[SPARK-12494][MLLIB] Array out of bound Exception in KMeans Yarn Mode ## What changes were proposed in this pull request? Better error message with k-means init can't be enough samples from input (because it is perhaps empty) ## How was this patch tested? Jenkins tests. Author: Sean Owen <[email protected]> Closes #11979 from srowen/SPARK-12494. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b841540 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b841540 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b841540 Branch: refs/heads/master Commit: 7b841540180e8d1403d6c95b02e93f129267b34f Parents: aac13fb Author: Sean Owen <[email protected]> Authored: Mon Mar 28 12:01:33 2016 +0100 Committer: Sean Owen <[email protected]> Committed: Mon Mar 28 12:01:33 2016 +0100 ---------------------------------------------------------------------- .../src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala | 2 ++ 1 file changed, 2 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/7b841540/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala index a7beb81..37a21cd 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala @@ -390,6 +390,8 @@ class KMeans private ( // Initialize each run's first center to a random point. val seed = new XORShiftRandom(this.seed).nextInt() val sample = data.takeSample(true, runs, seed).toSeq + // Could be empty if data is empty; fail with a better message early: + require(sample.size >= runs, s"Required $runs samples but got ${sample.size} from $data") val newCenters = Array.tabulate(runs)(r => ArrayBuffer(sample(r).toDense)) /** Merges new centers to centers. */ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
