[
https://issues.apache.org/jira/browse/SPARK-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070310#comment-14070310
]
Teng Qiu commented on SPARK-2446:
---------------------------------
hi [~marmbrus] impala creating parquet file also without UTF8 annotation for
strings, i just tried the newest impala release in CDH 5.1.0, it is still so.
is it possible, that add one more parameter in sqlContext.parquetFile(), to
disable/enable BinaryType ?
i am not sure if it's worth, but it is more flexible, and in our use case, we
have many parquet files they were created by impala... :)
for example change
def parquetFile(path: String): SchemaRDD
to
def parquetFile(path: String, allowBinaryType: Boolean = true): SchemaRDD
user can call sqlContext.parquetFile(xxx, false) to access parquet files made
by old spark version and impala.
then it is backward compatible, what do you think?
> Add BinaryType support to Parquet I/O.
> --------------------------------------
>
> Key: SPARK-2446
> URL: https://issues.apache.org/jira/browse/SPARK-2446
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Takuya Ueshin
> Assignee: Takuya Ueshin
> Fix For: 1.1.0
>
>
> To support {{BinaryType}}, the following changes are needed:
> - Make {{StringType}} use {{OriginalType.UTF8}}
> - Add {{BinaryType}} using {{PrimitiveTypeName.BINARY}} without
> {{OriginalType}}
--
This message was sent by Atlassian JIRA
(v6.2#6252)