Sorry, I was using Spark 1.3.x.
I cannot reproduce it in master. But should I still open a JIRA because can I request it to be back ported to 1.3.x branch? Thanks again! ________________________________ From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Saturday, May 09, 2015 2:41 AM To: Haopu Wang Cc: user; dev@spark.apache.org Subject: Re: [SparkSQL] cannot filter by a DateType column What version of Spark are you using? It appears that at least in master we are doing the conversion correctly, but its possible older versions of applySchema do not. If you can reproduce the same bug in master, can you open a JIRA? On Fri, May 8, 2015 at 1:36 AM, Haopu Wang <hw...@qilinsoft.com> wrote: I want to filter a DataFrame based on a Date column. If the DataFrame object is constructed from a scala case class, it's working (either compare as String or Date). But if the DataFrame is generated by specifying a Schema to an RDD, it doesn't work. Below is the exception and test code. Do you have any idea about the error? Thank you very much! ================exception================= java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2$$ anonfun$apply$6.apply(Cast.scala:116) at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$cata lyst$expressions$Cast$$buildCast(Cast.scala:111) at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2.a pply(Cast.scala:116) at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:426) at org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual.eval(predic ates.scala:305) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$ apply$1.apply(predicates.scala:30) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$ apply$1.apply(predicates.scala:30) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) ================code================= val conf = new SparkConf().setAppName("DFTest").setMaster("local[*]") val sc = new SparkContext(conf) val sqlCtx = new HiveContext(sc) import sqlCtx.implicits._ case class Test(dt: java.sql.Date) val df = sc.makeRDD(Seq(Test(new java.sql.Date(115,4,7)))).toDF var r = df.filter("dt >= '2015-05-06'") r.explain(true) r.show println("======") var r2 = df.filter("dt >= cast('2015-05-06' as DATE)") r2.explain(true) r2.show println("======") // "df2" doesn't do filter correct!! val rdd2 = sc.makeRDD(Seq((Row(new java.sql.Date(115,4,7))))) val schema = StructType(Array(StructField("dt", DateType, false))) val df2 = sqlCtx.applySchema(rdd2, schema) r = df2.filter("dt >= '2015-05-06'") r.explain(true) r.show println("======") r2 = df2.filter("dt >= cast('2015-05-06' as DATE)") r2.explain(true) r2.show