[
https://issues.apache.org/jira/browse/SPARK-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075256#comment-14075256
]
Teng Qiu edited comment on SPARK-2699 at 7/26/14 4:15 AM:
----------------------------------------------------------
added a config property: parquet.binarytype, default is true
we can use
{code}
sc.hadoopConfiguration.set("parquet.binarytype", "false")
{code}
to set value of this property
code example:
{code}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take
values as BinaryType
sc.hadoopConfiguration.set("parquet.binarytype", "false")
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take
values as StringType
{code}
was (Author: chutium):
added a config property: parquet.binarytype, default is true
we can use
{code}
sc.hadoopConfiguration.set("parquet.binarytype", "false")
{code}
to set value of this property
code example:
{code}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take
values as BinaryType
sc.hadoopConfiguration.set("parquet.binarytype", "false")
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take
values as StringType
{code}
> Improve compatibility with parquet file/table
> ---------------------------------------------
>
> Key: SPARK-2699
> URL: https://issues.apache.org/jira/browse/SPARK-2699
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: Teng Qiu
>
> after SPARK-2446, the compatibility with parquet file created by old spark
> release (spark 1.0.x) and by impala (all of versions until now: 1.4.x-cdh5)
> is broken.
> strings in those parquet files are not annotated with UTF8 or are just only
> ASCII char set (impala doesn't support UTF8 yet)
> this ticket aims to add a configuration option or some version check to
> support those parquet files.
--
This message was sent by Atlassian JIRA
(v6.2#6252)