[jira] [Assigned] (SPARK-15327) Catalyst code generation fails with complex data structure

Apache Spark (JIRA) Fri, 20 May 2016 15:50:44 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-15327:
------------------------------------

    Assignee: Davies Liu  (was: Apache Spark)

> Catalyst code generation fails with complex data structure
> ----------------------------------------------------------
>
>                 Key: SPARK-15327
>                 URL: https://issues.apache.org/jira/browse/SPARK-15327
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Jurriaan Pruis
>            Assignee: Davies Liu
>         Attachments: full_exception.txt
>
>
> Spark code generation fails with the following error when loading parquet 
> files with a complex structure:
> {code}
> : java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 158, Column 16: Expression "scan_isNull" is not an 
> rvalue
> {code}
> The generated code on line 158 looks like:
> {code}
> /* 153 */     this.scan_arrayWriter23 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 154 */     this.scan_rowWriter40 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_holder,
>  1);
> /* 155 */   }
> /* 156 */   
> /* 157 */   private void scan_apply_0(InternalRow scan_row) {
> /* 158 */     if (scan_isNull) {
> /* 159 */       scan_rowWriter.setNullAt(0);
> /* 160 */     } else {
> /* 161 */       // Remember the current cursor so that we can calculate how 
> many bytes are
> /* 162 */       // written later.
> /* 163 */       final int scan_tmpCursor = scan_holder.cursor;
> /* 164 */       
> {code}
> How to reproduce (Pyspark): 
> {code}
> # Some complex structure
> json = '{"h": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", "count": 
> 3}], "b": [{"e": "test", "count": 1}]}}, "d": {"b": {"c": [{"e": "adfgd"}], 
> "a": [{"e": "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}}, "c": 
> {"b": {"c": [{"e": "adfgd"}], "a": [{"count": 3}], "b": [{"e": "test", 
> "count": 1}]}}, "a": {"b": {"c": [{"e": "adfgd"}], "a": [{"count": 3}], "b": 
> [{"e": "test", "count": 1}]}}, "e": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": 
> "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}}, "g": {"b": {"c": 
> [{"e": "adfgd"}], "a": [{"e": "testing", "count": 3}], "b": [{"e": "test", 
> "count": 1}]}}, "f": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", 
> "count": 3}], "b": [{"e": "test", "count": 1}]}}, "b": {"b": {"c": [{"e": 
> "adfgd"}], "a": [{"count": 3}], "b": [{"e": "test", "count": 1}]}}}'
> # Write to parquet file
> sqlContext.read.json(sparkContext.parallelize([json])).write.mode('overwrite').parquet('test')
> # Try to read from parquet file (this generates an exception)
> sqlContext.read.parquet('test').collect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-15327) Catalyst code generation fails with complex data structure

Reply via email to