[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173383#comment-15173383 ]
Jeff Zhang commented on SPARK-13581: ------------------------------------ I suspect it is issue in the code generation. Because the root cause is that it should read the column features but actually it read the column label, so cause the match error. And df.show() is successful without any selection. The stacktrace shows the error come from code generator. Can any guy familiar with code generation help on this ? {code} Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost): scala.MatchError: 0.0 (of class java.lang.Double) at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:63) at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:60) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:40) at org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:305) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:350) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) {code} > LibSVM throws MatchError > ------------------------ > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Jakob Odersky > Assignee: Jeff Zhang > Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info] at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info] at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info] at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info] at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info] at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info] at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info] at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org