Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
collect() result in this exception?

On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

> How about
>
> data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]),
> Double.parseDouble(good_entry[1]))
> ​
> (full disclosure, I didn't actually run this). But after the first map you
> should have an RDD[Array[String]], then you'd discard everything shorter
> than 2, and convert the rest to dense vectors?...In fact if you're
> expecting length exactly 2 might want to filter ==2...
>
>
> On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel <devl.developm...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I'm trying a simple K-Means example as per the website:
>>
>> val parsedData = data.map(s =>
>> Vectors.dense(s.split(',').map(_.toDouble)))
>>
>> but I'm trying to write a Java based validation method first so that
>> missing values are omitted or replaced with 0.
>>
>> public RDD<Vector> prepareKMeans(JavaRDD<String> data) {
>>         JavaRDD<Vector> words = data.flatMap(new FlatMapFunction<String,
>> Vector>() {
>>             public Iterable<Vector> call(String s) {
>>                 String[] split = s.split(",");
>>                 ArrayList<Vector> add = new ArrayList<Vector>();
>>                 if (split.length != 2) {
>>                     add.add(Vectors.dense(0, 0));
>>                 } else
>>                 {
>>                     add.add(Vectors.dense(Double.parseDouble(split[0]),
>>                Double.parseDouble(split[1])));
>>                 }
>>
>>                 return add;
>>             }
>>         });
>>
>>         return words.rdd();
>> }
>>
>> When I then call from scala:
>>
>> val parsedData=dc.prepareKMeans(data);
>> val p=parsedData.collect();
>>
>> I get Exception in thread "main" java.lang.ClassCastException:
>> [Ljava.lang.Object; cannot be cast to
>> [Lorg.apache.spark.mllib.linalg.Vector;
>>
>> Why is the class tag is object rather than vector?
>>
>> 1) How do I get this working correctly using the Java validation example
>> above or
>> 2) How can I modify val parsedData = data.map(s =>
>> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size <2
>> I
>> ignore the line? or
>> 3) Is there a better way to do input validation first?
>>
>> Using spark and mlib:
>> libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"
>>
>> Many thanks in advance
>> Dev
>>
>
>

Reply via email to