[
https://issues.apache.org/jira/browse/SPARK-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070310#comment-14070310
]
Teng Qiu edited comment on SPARK-2446 at 7/22/14 2:47 PM:
----------------------------------------------------------
hi [~marmbrus] impala creating parquet file also without UTF8 annotation for
strings, i just tried the newest impala release in CDH 5.1.0, it is still so.
is it possible, that add one more parameter in sqlContext.parquetFile(), to
disable/enable BinaryType ?
i am not sure if it's worth, but it is more flexible, and in our use case, we
have many parquet files they were created by impala... :)
for example change
def parquetFile(path: String): SchemaRDD
to
def parquetFile(path: String, allowBinaryType: Boolean = true): SchemaRDD
by default allowBinaryType is set to true, but user can call
sqlContext.parquetFile(xxx, false) to access parquet files made by old spark
version and impala.
then it is backward compatible, what do you think?
was (Author: chutium):
hi [~marmbrus] impala creating parquet file also without UTF8 annotation for
strings, i just tried the newest impala release in CDH 5.1.0, it is still so.
is it possible, that add one more parameter in sqlContext.parquetFile(), to
disable/enable BinaryType ?
i am not sure if it's worth, but it is more flexible, and in our use case, we
have many parquet files they were created by impala... :)
for example change
def parquetFile(path: String): SchemaRDD
to
def parquetFile(path: String, allowBinaryType: Boolean = true): SchemaRDD
user can call sqlContext.parquetFile(xxx, false) to access parquet files made
by old spark version and impala.
then it is backward compatible, what do you think?
> Add BinaryType support to Parquet I/O.
> --------------------------------------
>
> Key: SPARK-2446
> URL: https://issues.apache.org/jira/browse/SPARK-2446
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Takuya Ueshin
> Assignee: Takuya Ueshin
> Fix For: 1.1.0
>
>
> To support {{BinaryType}}, the following changes are needed:
> - Make {{StringType}} use {{OriginalType.UTF8}}
> - Add {{BinaryType}} using {{PrimitiveTypeName.BINARY}} without
> {{OriginalType}}
--
This message was sent by Atlassian JIRA
(v6.2#6252)