[ 
https://issues.apache.org/jira/browse/SPARK-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075256#comment-14075256
 ] 

Teng Qiu edited comment on SPARK-2699 at 7/26/14 4:15 AM:
----------------------------------------------------------

added a config property: parquet.binarytype, default is true

we can use
{code}
sc.hadoopConfiguration.set("parquet.binarytype", "false")
{code}
to set value of this property

code example:
{code}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take 
values as BinaryType

sc.hadoopConfiguration.set("parquet.binarytype", "false")
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take 
values as StringType
{code}



was (Author: chutium):
added a config property: parquet.binarytype, default is true

we can use
{code}
sc.hadoopConfiguration.set("parquet.binarytype", "false")
{code}
to set value of this property

code example:
{code}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take 
values as BinaryType

sc.hadoopConfiguration.set("parquet.binarytype", "false")
sqlContext.parquetFile("/tmp/old_parquet_by_spark_1.0.x").take(2) // take 
values as StringType
{code}


> Improve compatibility with parquet file/table
> ---------------------------------------------
>
>                 Key: SPARK-2699
>                 URL: https://issues.apache.org/jira/browse/SPARK-2699
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Teng Qiu
>
> after SPARK-2446, the compatibility with parquet file created by old spark 
> release (spark 1.0.x) and by impala (all of versions until now: 1.4.x-cdh5) 
> is broken.
> strings in those parquet files are not annotated with UTF8 or are just only 
> ASCII char set (impala doesn't support UTF8 yet)
> this ticket aims to add a configuration option or some version check to 
> support those parquet files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to