[
https://issues.apache.org/jira/browse/SPARK-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645697#comment-14645697
]
James Aley commented on SPARK-8786:
-----------------------------------
Will this change mean that joins, distinct, equality etc on binary columns will
work?
We serialise UUIDs as binary in our analytics system, as we have to attach them
to every event, of which there are billions, therefore the space saving vs a
string is fairly significant. Right now, our SQL queries and analysis jobs use
a UDF we've written to read them back into UUID strings on the fly, so that we
can compare them.
Would I be right in thinking that once this issue is solved, we can skip that
transformation and should therefore see some speed-up?
> Create a wrapper for BinaryType
> -------------------------------
>
> Key: SPARK-8786
> URL: https://issues.apache.org/jira/browse/SPARK-8786
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Davies Liu
>
> The hashCode and equals() of Array[Byte] does check the bytes, we should
> create a wrapper (internally) to do that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]