[
https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-17093:
------------------------------------
Assignee: Apache Spark
> Roundtrip encoding of array<struct<>> fields is wrong when whole-stage
> codegen is disabled
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-17093
> URL: https://issues.apache.org/jira/browse/SPARK-17093
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Josh Rosen
> Assignee: Apache Spark
> Priority: Critical
>
> The following failing test demonstrates a bug where Spark mis-encodes
> array-of-struct fields if whole-stage codegen is disabled:
> {code}
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
> val data = Array(Array((1, 2), (3, 4)))
> val ds = spark.sparkContext.parallelize(data).toDS()
> assert(ds.collect() === data)
> }
> {code}
> When wholestage codegen is enabled (the default), this works fine. When it's
> disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}.
> Because the last element of the array appears to be repeated my best guess is
> that the interpreted evaluation codepath forgot to {{copy()}} somewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]