[
https://issues.apache.org/jira/browse/SPARK-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646022#comment-14646022
]
James Aley commented on SPARK-8786:
-----------------------------------
It does? That doesn't appear to be the case for me in 1.4.1:
{code}
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val schema = StructType(StructField("x", BinaryType, nullable=false) :: Nil)
val data = sc.parallelize(Row(Array[Byte](1.toByte)) ::
Row(Array[Byte](1.toByte)) :: Row(Array[Byte](2.toByte)) :: Nil)
val df = sqlContext.createDataFrame(data, schema)
df.registerTempTable("test")
sqlContext.sql("SELECT DISTINCT x FROM test").show()
// Returned three rows, expected two.
// +-----------+
// | x|
// +-----------+
// |[B@7671a060|
// |[B@5f7b5e0c|
// |[B@2b7b225f|
// +-----------+
{code}
I can raise a separate bug if it helps, I just wanted to check to see whether
this ticket would already resolve the issue.
> Create a wrapper for BinaryType
> -------------------------------
>
> Key: SPARK-8786
> URL: https://issues.apache.org/jira/browse/SPARK-8786
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Davies Liu
>
> The hashCode and equals() of Array[Byte] does check the bytes, we should
> create a wrapper (internally) to do that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]