[ 
https://issues.apache.org/jira/browse/SPARK-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108715#comment-15108715
 ] 

kevin yu commented on SPARK-12911:
----------------------------------

I will look into this . Thanks.
Kevin

> Cacheing a dataframe causes array comparisons to fail (in filter / where) 
> after 1.6
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-12911
>                 URL: https://issues.apache.org/jira/browse/SPARK-12911
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>         Environment: OSX 10.11.1, Scala 2.11.7, Spark 1.6.0
>            Reporter: Jesse English
>
> When doing a *where* operation on a dataframe and testing for equality on an 
> array type, after 1.6 no valid comparisons are made if the dataframe has been 
> cached.  If it has not been cached, the results are as expected.
> This appears to be related to the underlying unsafe array data types.
> {code:title=test.scala|borderStyle=solid}
> test("test array comparison") {
>     val vectors: Vector[Row] =  Vector(
>       Row.fromTuple("id_1" -> Array(0L, 2L)),
>       Row.fromTuple("id_2" -> Array(0L, 5L)),
>       Row.fromTuple("id_3" -> Array(0L, 9L)),
>       Row.fromTuple("id_4" -> Array(1L, 0L)),
>       Row.fromTuple("id_5" -> Array(1L, 8L)),
>       Row.fromTuple("id_6" -> Array(2L, 4L)),
>       Row.fromTuple("id_7" -> Array(5L, 6L)),
>       Row.fromTuple("id_8" -> Array(6L, 2L)),
>       Row.fromTuple("id_9" -> Array(7L, 0L))
>     )
>     val data: RDD[Row] = sc.parallelize(vectors, 3)
>     val schema = StructType(
>       StructField("id", StringType, false) ::
>         StructField("point", DataTypes.createArrayType(LongType, false), 
> false) ::
>         Nil
>     )
>     val sqlContext = new SQLContext(sc)
>     val dataframe = sqlContext.createDataFrame(data, schema)
>     val targetPoint:Array[Long] = Array(0L,9L)
>     //Cacheing is the trigger to cause the error (no cacheing causes no error)
>     dataframe.cache()
>     //This is the line where it fails
>     //java.util.NoSuchElementException: next on empty iterator
>     //However we know that there is a valid match
>     val targetRow = dataframe.where(dataframe("point") === 
> array(targetPoint.map(value => lit(value)): _*)).first()
>     assert(targetRow != null)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to