[jira] [Commented] (SPARK-23388) Support for Parquet Binary DecimalType in VectorizedColumnReader

Wenchen Fan (JIRA) Tue, 13 Feb 2018 00:43:49 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-23388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361990#comment-16361990
 ]


Wenchen Fan commented on SPARK-23388:
-------------------------------------

This is an interoperability problem: although Spark SQL always write out large 
precision decimal type as fixed-length-byte-array, Parquet spec also allow 
binary. In Spark 2.3 we may not be able to read parquet files written by other 
systems because of this bug.

cc [~sameerag] shall we include it in Spark 2.3.0?

> Support for Parquet Binary DecimalType in VectorizedColumnReader
> ----------------------------------------------------------------
>
>                 Key: SPARK-23388
>                 URL: https://issues.apache.org/jira/browse/SPARK-23388
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: James Thompson
>            Assignee: James Thompson
>            Priority: Major
>             Fix For: 2.3.1
>
>
> The following commit to spark removed support for decimal binary types: 
> [https://github.com/apache/spark/commit/9c29c557635caf739fde942f53255273aac0d7b1#diff-7bdf5fd0ce0b1ccbf4ecf083611976e6R428]
> As per the parquet spec, decimal can be used to annotate binary types, so 
> support should be re-added: 
> [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23388) Support for Parquet Binary DecimalType in VectorizedColumnReader

Reply via email to