Gramce created SPARK-12750:
------------------------------
Summary: Java class method don't work properly
Key: SPARK-12750
URL: https://issues.apache.org/jira/browse/SPARK-12750
Project: Spark
Issue Type: Question
Reporter: Gramce
I use java spark to tansform the labeledpoint.
I want to select several columns from the JavaRdd<labeledPoint>. For example
the first three colunmns.
So I wrote like this:
int[] ad={1,2,3};
int b=ad.length;
JavaRDD<LabeledPoint> ggd=parsedData.map(
new Function<LabeledPoint, LabeledPoint>(){
public LabeledPoint call(LabeledPoint a){
double[] v =new double[b];
for(int i=0;i<b;i++){
v[i]=a.features().toArray()[ad[i]];
}
return new
LabeledPoint(a.label(),Vectors.dense(v));
}
});
where parsedData is a LabeledPoint data.
Now I want to converse this to a method. So the code is like this:
class myrddd{
public JavaRDD<LabeledPoint> abcd;
public myrddd(JavaRDD<LabeledPoint> deff ){
abcd=deff;
}
public JavaRDD<LabeledPoint> abcdf(int[]asdf,int b){
JavaRDD<LabeledPoint> bcd=abcd;
JavaRDD<LabeledPoint> mms=bcd.map(
new Function<LabeledPoint, LabeledPoint>(){
public LabeledPoint call(LabeledPoint a){
double[] v =new double[b];
for(int i=0;i<b;i++){
v[i]=a.features().toArray()[asdf[i]];
}
return new
LabeledPoint(a.label(),Vectors.dense(v));
}
});
return(mms);}
}
And
myrddd ndfs=new myrddd(parsedData);
JavaRDD<LabeledPoint> ggdf=ndfs.abcdf(ad, b);
But this doesn't work.Following is the error:
Exception in thread "main" org.apache.spark.SparkException: Task not
serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.map(RDD.scala:317)
at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:93)
at
org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47)
at anbv.qwe.myrddd.abcdf(dfa.java:53)
at anbv.qwe.dfa.main(dfa.java:42)
Caused by: java.io.NotSerializableException: anbv.qwe.myrddd
Serialization stack:
- object not serializable (class: anbv.qwe.myrddd, value:
anbv.qwe.myrddd@310aee0b)
- field (class: anbv.qwe.myrddd$1, name: this$0, type: class
anbv.qwe.myrddd)
- object (class anbv.qwe.myrddd$1, anbv.qwe.myrddd$1@4b76aa5a)
- field (class:
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1,
type: interface org.apache.spark.api.java.function.Function)
- object (class
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 13 more
but this
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]