[jira] [Commented] (SPARK-21997) Spark shows different results on char/varchar columns on Parquet

Wenchen Fan (JIRA) Tue, 31 Oct 2017 08:14:21 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226957#comment-16226957
 ]


Wenchen Fan commented on SPARK-21997:
-------------------------------------

A better fix is to add special handling of varchar type in the read path, which 
appends blanks to string value to satisfy varchar length. But this may be hard 
as you need to fix both normal reader and columnar reader.

> Spark shows different results on char/varchar columns on Parquet
> ----------------------------------------------------------------
>
>                 Key: SPARK-21997
>                 URL: https://issues.apache.org/jira/browse/SPARK-21997
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2, 2.1.1, 2.2.0
>            Reporter: Dongjoon Hyun
>
> SPARK-19459 resolves CHAR/VARCHAR issues in general, but Spark shows 
> different results according to the SQL configuration, 
> *spark.sql.hive.convertMetastoreParquet*. We had better fix this. Actually, 
> the default of `spark.sql.hive.convertMetastoreParquet` is true, so the 
> result is wrong by default.
> {code}
> scala> sql("CREATE TABLE t_char(a CHAR(10), b VARCHAR(10)) STORED AS parquet")
> scala> sql("INSERT INTO TABLE t_char SELECT 'a', 'b'")
> scala> sql("SELECT * FROM t_char").show
> +---+---+
> |  a|  b|
> +---+---+
> |  a|  b|
> +---+---+
> scala> sql("set spark.sql.hive.convertMetastoreParquet=false")
> scala> sql("SELECT * FROM t_char").show
> +----------+---+
> |         a|  b|
> +----------+---+
> |a         |  b|
> +----------+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21997) Spark shows different results on char/varchar columns on Parquet

Reply via email to