[
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252775#comment-17252775
]
Ala Luszczak commented on SPARK-33594:
--------------------------------------
Big :+1: here. Having binary column as partition-by is a terrible idea.
I've seen at least two really bad scenarios result from this.
(1) When reading the data with the vectorized reader, I've seen segmentation
faults.
(2) When reading the same data with the non-vectorized (parquet-mr) reader, the
segmentation faults disappear, but instead incorrect values are returned for
the binary columns.
I would like to point out that just covering the CREATE TABLE statement might
not be enough. I think we should bail in the read path as well. After all the
user can jest do spark.read.parquet("my/path") without creating a table first.
> Forbid binary type as partition column
> --------------------------------------
>
> Key: SPARK-33594
> URL: https://issues.apache.org/jira/browse/SPARK-33594
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: angerszhu
> Priority: Major
>
> Forbid binary type as partition column
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]