[jira] [Comment Edited] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

Yanbo Liang (JIRA) Wed, 16 Dec 2015 01:39:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059753#comment-15059753
 ]


Yanbo Liang edited comment on SPARK-12363 at 12/16/15 9:38 AM:
---------------------------------------------------------------

After I removed [this 
line|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L388],
 
[this|https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala#L71]
 test cases failed.
I pasted the test cases here. It's very strange that the following test cases 
are based on the same dataset, but one success and the other failed.
Another clue is that when use `setInitializationMode("degree")` to train the 
PIC model, both the following two cases can pass. But if we use 
`setInitializationMode("random")`, the second test case will failed.
{code}
test("power iteration clustering") {
    /*
     We use the following graph to test PIC. All edges are assigned similarity 
1.0 except 0.1 for
     edge (3, 4).
     15-14 -13 -12
     |           |
     4 . 3 - 2  11
     |   | x |   |
     5   0 - 1  10
     |           |
     6 - 7 - 8 - 9
     */

    val similarities = Seq[(Long, Long, Double)]((0, 1, 1.0), (0, 2, 1.0), (0, 
3, 1.0), (1, 2, 1.0),
      (1, 3, 1.0), (2, 3, 1.0), (3, 4, 0.1), // (3, 4) is a weak edge
      (4, 5, 1.0), (4, 15, 1.0), (5, 6, 1.0), (6, 7, 1.0), (7, 8, 1.0), (8, 9, 
1.0), (9, 10, 1.0),
      (10, 11, 1.0), (11, 12, 1.0), (12, 13, 1.0), (13, 14, 1.0), (14, 15, 1.0))
    val model = new PowerIterationClustering()
      .setK(2)
      .run(sc.parallelize(similarities, 2))
    val predictions = Array.fill(2)(mutable.Set.empty[Long])
    model.assignments.collect().foreach { a =>
      predictions(a.cluster) += a.id
    }
    assert(predictions.toSet == Set((0 to 3).toSet, (4 to 15).toSet))

    val model2 = new PowerIterationClustering()
      .setK(2)
      .setInitializationMode("degree")
      .run(sc.parallelize(similarities, 2))
    val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
    model2.assignments.collect().foreach { a =>
      predictions2(a.cluster) += a.id
    }
    assert(predictions2.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
  }

  test("power iteration clustering on graph") {
    /*
     We use the following graph to test PIC. All edges are assigned similarity 
1.0 except 0.1 for
     edge (3, 4).
     15-14 -13 -12
     |           |
     4 . 3 - 2  11
     |   | x |   |
     5   0 - 1  10
     |           |
     6 - 7 - 8 - 9
     */

    val similarities = Seq[(Long, Long, Double)]((0, 1, 1.0), (0, 2, 1.0), (0, 
3, 1.0), (1, 2, 1.0),
      (1, 3, 1.0), (2, 3, 1.0), (3, 4, 0.1), // (3, 4) is a weak edge
      (4, 5, 1.0), (4, 15, 1.0), (5, 6, 1.0), (6, 7, 1.0), (7, 8, 1.0), (8, 9, 
1.0), (9, 10, 1.0),
      (10, 11, 1.0), (11, 12, 1.0), (12, 13, 1.0), (13, 14, 1.0), (14, 15, 1.0))

    val edges = similarities.flatMap { case (i, j, s) =>
      if (i != j) {
        Seq(Edge(i, j, s), Edge(j, i, s))
      } else {
        None
      }
    }
    val graph = Graph.fromEdges(sc.parallelize(edges, 2), 0.0)

    val model = new PowerIterationClustering()
      .setK(2)
      .run(graph)
    val predictions = Array.fill(2)(mutable.Set.empty[Long])
    model.assignments.collect().foreach { a =>
      predictions(a.cluster) += a.id
    }
    assert(predictions.toSet == Set((0 to 3).toSet, (4 to 15).toSet))

    val model2 = new PowerIterationClustering()
      .setK(2)
      .setInitializationMode("degree")
      .run(sc.parallelize(similarities, 2))
    val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
    model2.assignments.collect().foreach { a =>
      predictions2(a.cluster) += a.id
    }
    assert(predictions2.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
  }
{code}


was (Author: yanboliang):
After I removed [this 
line|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L388],
 
[this|https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala#L71]
 test cases failed.
It's very strange that the following test cases are the same dataset, but one 
success and the other failed.
{code}
test("power iteration clustering") {
    /*
     We use the following graph to test PIC. All edges are assigned similarity 
1.0 except 0.1 for
     edge (3, 4).
     15-14 -13 -12
     |           |
     4 . 3 - 2  11
     |   | x |   |
     5   0 - 1  10
     |           |
     6 - 7 - 8 - 9
     */

    val similarities = Seq[(Long, Long, Double)]((0, 1, 1.0), (0, 2, 1.0), (0, 
3, 1.0), (1, 2, 1.0),
      (1, 3, 1.0), (2, 3, 1.0), (3, 4, 0.1), // (3, 4) is a weak edge
      (4, 5, 1.0), (4, 15, 1.0), (5, 6, 1.0), (6, 7, 1.0), (7, 8, 1.0), (8, 9, 
1.0), (9, 10, 1.0),
      (10, 11, 1.0), (11, 12, 1.0), (12, 13, 1.0), (13, 14, 1.0), (14, 15, 1.0))
    val model = new PowerIterationClustering()
      .setK(2)
      .run(sc.parallelize(similarities, 2))
    val predictions = Array.fill(2)(mutable.Set.empty[Long])
    model.assignments.collect().foreach { a =>
      predictions(a.cluster) += a.id
    }
    assert(predictions.toSet == Set((0 to 3).toSet, (4 to 15).toSet))

    val model2 = new PowerIterationClustering()
      .setK(2)
      .setInitializationMode("degree")
      .run(sc.parallelize(similarities, 2))
    val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
    model2.assignments.collect().foreach { a =>
      predictions2(a.cluster) += a.id
    }
    assert(predictions2.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
  }

  test("power iteration clustering on graph") {
    /*
     We use the following graph to test PIC. All edges are assigned similarity 
1.0 except 0.1 for
     edge (3, 4).
     15-14 -13 -12
     |           |
     4 . 3 - 2  11
     |   | x |   |
     5   0 - 1  10
     |           |
     6 - 7 - 8 - 9
     */

    val similarities = Seq[(Long, Long, Double)]((0, 1, 1.0), (0, 2, 1.0), (0, 
3, 1.0), (1, 2, 1.0),
      (1, 3, 1.0), (2, 3, 1.0), (3, 4, 0.1), // (3, 4) is a weak edge
      (4, 5, 1.0), (4, 15, 1.0), (5, 6, 1.0), (6, 7, 1.0), (7, 8, 1.0), (8, 9, 
1.0), (9, 10, 1.0),
      (10, 11, 1.0), (11, 12, 1.0), (12, 13, 1.0), (13, 14, 1.0), (14, 15, 1.0))

    val edges = similarities.flatMap { case (i, j, s) =>
      if (i != j) {
        Seq(Edge(i, j, s), Edge(j, i, s))
      } else {
        None
      }
    }
    val graph = Graph.fromEdges(sc.parallelize(edges, 2), 0.0)

    val model = new PowerIterationClustering()
      .setK(2)
      .run(graph)
    val predictions = Array.fill(2)(mutable.Set.empty[Long])
    model.assignments.collect().foreach { a =>
      predictions(a.cluster) += a.id
    }
    assert(predictions.toSet == Set((0 to 3).toSet, (4 to 15).toSet))

    val model2 = new PowerIterationClustering()
      .setK(2)
      .setInitializationMode("degree")
      .run(sc.parallelize(similarities, 2))
    val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
    model2.assignments.collect().foreach { a =>
      predictions2(a.cluster) += a.id
    }
    assert(predictions2.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
  }
{code}

> PowerIterationClustering test case failed if we deprecated KMeans.setRuns
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12363
>                 URL: https://issues.apache.org/jira/browse/SPARK-12363
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, MLlib
>            Reporter: Yanbo Liang
>
> We plan to deprecated `runs` of KMeans, PowerIterationClustering will 
> leverage KMeans to train model.
> I removed `setRuns` used in PowerIterationClustering, but one of the test 
> cases failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

Reply via email to