[jira] [Commented] (SPARK-2446) Add BinaryType support to Parquet I/O.

Teng Qiu (JIRA) Tue, 22 Jul 2014 07:28:57 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070310#comment-14070310
 ]


Teng Qiu commented on SPARK-2446:
---------------------------------

hi [~marmbrus] impala creating parquet file also without UTF8 annotation for 
strings, i just tried the newest impala release in CDH 5.1.0, it is still so.

is it possible, that add one more parameter in sqlContext.parquetFile(), to 
disable/enable BinaryType ?

i am not sure if it's worth, but it is more flexible, and in our use case, we 
have many parquet files they were created by impala... :)

for example change
def parquetFile(path: String): SchemaRDD
to
def parquetFile(path: String, allowBinaryType: Boolean = true): SchemaRDD

user can call sqlContext.parquetFile(xxx, false) to access parquet files made 
by old spark version and impala.

then it is backward compatible, what do you think?

> Add BinaryType support to Parquet I/O.
> --------------------------------------
>
>                 Key: SPARK-2446
>                 URL: https://issues.apache.org/jira/browse/SPARK-2446
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Takuya Ueshin
>            Assignee: Takuya Ueshin
>             Fix For: 1.1.0
>
>
> To support {{BinaryType}}, the following changes are needed:
> - Make {{StringType}} use {{OriginalType.UTF8}}
> - Add {{BinaryType}} using {{PrimitiveTypeName.BINARY}} without 
> {{OriginalType}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2446) Add BinaryType support to Parquet I/O.

Reply via email to