[
https://issues.apache.org/jira/browse/SPARK-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151233#comment-14151233
]
Pulkit Bhuwalka commented on SPARK-3274:
----------------------------------------
SparkConf sparkConf = new SparkConf().setAppName("Page
Rank").setMaster("local[4]");
JavaSparkContext context = new JavaSparkContext(sparkConf);
JavaPairRDD<String, String> transformedLinkMap =
context.sequenceFile(pageRankOptions.getFileLocation(),
String.class, String.class, 1)
.mapToPair(new PairFunction<Tuple2<String, String>, String,
String>() {
@Override
public Tuple2<String, String> call(Tuple2<String, String>
urlAndLinks) throws Exception {
// return new Tuple2<String, String>(urlAndLinks._1(),
urlAndLinks._2());
return new Tuple2<String, String>(
urlAndLinks._1(),
new LinkDetails(1.0, new
LinkParser().parse(urlAndLinks._2())).toString()
);
}
});
When I use the commented line above, which simply returns the strings, it
works. However, when I use the code after that with LinkDetails which simply
parses the string into an object, the code fails with a ClassCastException.
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
java.lang.String
at
io.pulkit.cmu.acc.project1.phase2.PageRankSparkJob$1.call(PageRankSparkJob.java:28)
at
io.pulkit.cmu.acc.project1.phase2.PageRankSparkJob$1.call(PageRankSparkJob.java:24)
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:926)
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:926)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1167)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
I looked at the other link mentioned. However, the pull request link on that
does not work and it is marked as resolved in 0.9. However, I'm using 1.1.0.
Thanks a lot.
> Spark Streaming Java API reports java.lang.ClassCastException when calling
> collectAsMap on JavaPairDStream
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-3274
> URL: https://issues.apache.org/jira/browse/SPARK-3274
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 1.0.2
> Reporter: Jack Hu
>
> Reproduce code:
> scontext
> .socketTextStream("localhost", 18888)
> .mapToPair(new PairFunction<String, String, String>(){
> public Tuple2<String, String> call(String arg0)
> throws Exception {
> return new Tuple2<String, String>("1", arg0);
> }
> })
> .foreachRDD(new Function2<JavaPairRDD<String, String>, Time,
> Void>() {
> public Void call(JavaPairRDD<String, String> v1, Time
> v2) throws Exception {
> System.out.println(v2.toString() + ": " +
> v1.collectAsMap().toString());
> return null;
> }
> });
> Exception:
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to
> [Lscala.Tupl
> e2;
> at
> org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.s
> cala:447)
> at
> org.apache.spark.api.java.JavaPairRDD.collectAsMap(JavaPairRDD.scala:
> 464)
> at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:90)
> at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:88)
> at
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
> at
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mc
> V$sp(ForEachDStream.scala:41)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobS
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]