Repository: spark
Updated Branches:
  refs/heads/master aac13fb48 -> 7b8415401


[SPARK-12494][MLLIB] Array out of bound Exception in KMeans Yarn Mode

## What changes were proposed in this pull request?

Better error message with k-means init can't be enough samples from input 
(because it is perhaps empty)

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <[email protected]>

Closes #11979 from srowen/SPARK-12494.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b841540
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b841540
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b841540

Branch: refs/heads/master
Commit: 7b841540180e8d1403d6c95b02e93f129267b34f
Parents: aac13fb
Author: Sean Owen <[email protected]>
Authored: Mon Mar 28 12:01:33 2016 +0100
Committer: Sean Owen <[email protected]>
Committed: Mon Mar 28 12:01:33 2016 +0100

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala  | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/7b841540/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
index a7beb81..37a21cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
@@ -390,6 +390,8 @@ class KMeans private (
     // Initialize each run's first center to a random point.
     val seed = new XORShiftRandom(this.seed).nextInt()
     val sample = data.takeSample(true, runs, seed).toSeq
+    // Could be empty if data is empty; fail with a better message early:
+    require(sample.size >= runs, s"Required $runs samples but got 
${sample.size} from $data")
     val newCenters = Array.tabulate(runs)(r => ArrayBuffer(sample(r).toDense))
 
     /** Merges new centers to centers. */


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to