[
https://issues.apache.org/jira/browse/SPARK-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-15327:
------------------------------------
Assignee: Davies Liu (was: Apache Spark)
> Catalyst code generation fails with complex data structure
> ----------------------------------------------------------
>
> Key: SPARK-15327
> URL: https://issues.apache.org/jira/browse/SPARK-15327
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Jurriaan Pruis
> Assignee: Davies Liu
> Attachments: full_exception.txt
>
>
> Spark code generation fails with the following error when loading parquet
> files with a complex structure:
> {code}
> : java.util.concurrent.ExecutionException: java.lang.Exception: failed to
> compile: org.codehaus.commons.compiler.CompileException: File
> 'generated.java', Line 158, Column 16: Expression "scan_isNull" is not an
> rvalue
> {code}
> The generated code on line 158 looks like:
> {code}
> /* 153 */ this.scan_arrayWriter23 = new
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 154 */ this.scan_rowWriter40 = new
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_holder,
> 1);
> /* 155 */ }
> /* 156 */
> /* 157 */ private void scan_apply_0(InternalRow scan_row) {
> /* 158 */ if (scan_isNull) {
> /* 159 */ scan_rowWriter.setNullAt(0);
> /* 160 */ } else {
> /* 161 */ // Remember the current cursor so that we can calculate how
> many bytes are
> /* 162 */ // written later.
> /* 163 */ final int scan_tmpCursor = scan_holder.cursor;
> /* 164 */
> {code}
> How to reproduce (Pyspark):
> {code}
> # Some complex structure
> json = '{"h": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", "count":
> 3}], "b": [{"e": "test", "count": 1}]}}, "d": {"b": {"c": [{"e": "adfgd"}],
> "a": [{"e": "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}}, "c":
> {"b": {"c": [{"e": "adfgd"}], "a": [{"count": 3}], "b": [{"e": "test",
> "count": 1}]}}, "a": {"b": {"c": [{"e": "adfgd"}], "a": [{"count": 3}], "b":
> [{"e": "test", "count": 1}]}}, "e": {"b": {"c": [{"e": "adfgd"}], "a": [{"e":
> "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}}, "g": {"b": {"c":
> [{"e": "adfgd"}], "a": [{"e": "testing", "count": 3}], "b": [{"e": "test",
> "count": 1}]}}, "f": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing",
> "count": 3}], "b": [{"e": "test", "count": 1}]}}, "b": {"b": {"c": [{"e":
> "adfgd"}], "a": [{"count": 3}], "b": [{"e": "test", "count": 1}]}}}'
> # Write to parquet file
> sqlContext.read.json(sparkContext.parallelize([json])).write.mode('overwrite').parquet('test')
> # Try to read from parquet file (this generates an exception)
> sqlContext.read.parquet('test').collect()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]