Hi guys, I know the RDDs are immutable and therefore their value cannot be changed but I see the following behaviour: I wrote an implementation for FuzzyCMeans algorithm and now I'm testing it, so i run the following example:
import org.apache.spark.mllib.clustering.FuzzyCMeans import org.apache.spark.mllib.linalg.Vectors val data = sc.textFile("/home/development/myPrjects/R/butterfly/butterfly.txt") val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache() > parsedData: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] > = MapPartitionsRDD[2] at map at <console>:31 val numClusters = 2 val numIterations = 20 parsedData.foreach{ point => println(point) } > [0.0,-8.0] [-3.0,-2.0] [-3.0,0.0] [-3.0,2.0] [-2.0,-1.0] [-2.0,0.0] [-2.0,1.0] [-1.0,0.0] [0.0,0.0] [1.0,0.0] [2.0,-1.0] [2.0,0.0] [2.0,1.0] [3.0,-2.0] [3.0,0.0] [3.0,2.0] [0.0,8.0] val clusters = FuzzyCMeans.train(parsedData, numClusters, numIteration parsedData.foreach{ point => println(point) } > [0.0,-0.4803333185624595] [-0.1811743096972924,-0.12078287313152826] [-0.06638890786148487,0.0] [-0.04005925925925929,0.02670617283950619] [-0.12193263222069807,-0.060966316110349035] [-0.0512,0.0] [NaN,NaN] [-0.049382716049382706,0.0] [NaN,NaN] [0.006830134553650707,0.0] [0.05120000000000002,-0.02560000000000001] [0.04755220304297078,0.0] [0.06581619798335057,0.03290809899167529] [0.12010867103812725,-0.0800724473587515] [0.10946638900458144,0.0] [0.14814814814814817,0.09876543209876545] [0.0,0.49119985188436205] But how can this be that my method changes the Immutable RDD? BTW, the signature of the train method, is the following: train( data: RDD[Vector], clusters: Int, maxIterations: Int) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Vector-Immutability-issue-tp15827.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org