[ 
https://issues.apache.org/jira/browse/SPARK-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8448:
------------------------------------
    Target Version/s: 1.6.0  (was: 1.5.0)

> ORC data source doesn't support column names with comma
> -------------------------------------------------------
>
>                 Key: SPARK-8448
>                 URL: https://issues.apache.org/jira/browse/SPARK-8448
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Cheng Lian
>
> Spark shell snippet for reproduction:
> {code}
> sqlContext.range(0, 10).select('id as "a, 
> b").write.format("orc").save("/tmp/foo")
> {code}
> Exception thrown:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>         at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>         at java.util.ArrayList.get(ArrayList.java:429)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.<init>(OrcStruct.java:190)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:529)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106)
>         at 
> org.apache.spark.sql.hive.orc.OrcOutputWriter.<init>(OrcRelation.scala:76)
>         at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anon$1.newInstance(OrcRelation.scala:200)
>         at 
> org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:410)
>         at 
> org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:318)
>         at 
> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:147)
>         at 
> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:135)
>         at 
> org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:135)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because ORC SerDe requires a property named {{columns}}, which is a 
> comma separated list of output column names.
> We should catch this case at analysis phase and throw an 
> {{AnalysisException}} with a more helpful error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to