[jira] [Commented] (SPARK-33594) Forbid binary type as partition column

Ala Luszczak (Jira) Mon, 21 Dec 2020 03:10:36 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252775#comment-17252775
 ]


Ala Luszczak commented on SPARK-33594:
--------------------------------------

Big :+1: here. Having binary column as partition-by is a terrible idea.
I've seen at least two really bad scenarios result from this.

(1) When reading the data with the vectorized reader, I've seen segmentation 
faults.
(2) When reading the same data with the non-vectorized (parquet-mr) reader, the 
segmentation faults disappear, but instead incorrect values are returned for 
the binary columns.

I would like to point out that just covering the CREATE TABLE statement might 
not be enough. I think we should bail in the read path as well. After all the 
user can jest do spark.read.parquet("my/path") without creating a table first.

> Forbid binary type as partition column
> --------------------------------------
>
>                 Key: SPARK-33594
>                 URL: https://issues.apache.org/jira/browse/SPARK-33594
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: angerszhu
>            Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-33594) Forbid binary type as partition column

Reply via email to