[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

Giri Dandu (Jira) Fri, 17 Apr 2020 08:06:24 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-27913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085852#comment-17085852
 ]


Giri Dandu commented on SPARK-27913:
------------------------------------

[~viirya] Sorry for late reply.

I re-ran the same test in spark 2.4.5 and it *is NOT working* but it works in 
2.3.0. I get the same error in spark 2.4.5

 
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1Caused by: 
java.lang.ArrayIndexOutOfBoundsException: 1 at 
org.apache.orc.mapred.OrcStruct.getFieldValue(OrcStruct.java:49) at 
org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$org$apache$spark$sql$execution$datasources$orc$OrcDeserializer$$newWriter$14.apply(OrcDeserializer.scala:133)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$org$apache$spark$sql$execution$datasources$orc$OrcDeserializer$$newWriter$14.apply(OrcDeserializer.scala:123)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$2$$anonfun$apply$1.apply(OrcDeserializer.scala:51)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$2$$anonfun$apply$1.apply(OrcDeserializer.scala:51)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcDeserializer.deserialize(OrcDeserializer.scala:64)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2$$anonfun$apply$8.apply(OrcFileFormat.scala:234)
 at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2$$anonfun$apply$8.apply(OrcFileFormat.scala:233)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:104)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255) 
{code}

> Spark SQL's native ORC reader implements its own schema evolution
> -----------------------------------------------------------------
>
>                 Key: SPARK-27913
>                 URL: https://issues.apache.org/jira/browse/SPARK-27913
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.3
>            Reporter: Owen O'Malley
>            Priority: Major
>
> ORC's reader handles a wide range of schema evolution, but the Spark SQL 
> native ORC bindings do not provide the desired schema to the ORC reader. This 
> causes a regression when moving spark.sql.orc.impl from 'hive' to 'native'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

Reply via email to