[
https://issues.apache.org/jira/browse/HIVE-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-24504:
----------------------------------
Labels: pull-request-available (was: )
> VectorFileSinkArrowOperator does not serialize complex types correctly
> ----------------------------------------------------------------------
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
> Issue Type: Bug
> Components: llap
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records theĀ
> VectorFileSinkArrowOperator only serializes the primitive types correctly.
> For complex types only the main type is set which causes issues for clients
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>
> org.apache.spark.sql.vectorized.ArrowColumnVector.<init>(ArrowColumnVector.java:135)
>
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
> Source)
>
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source)
>
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> org.apache.spark.scheduler.Task.run(Task.scala:109)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> at
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
> at
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
> at org.apache.spark.scheduler.Task.run(Task.scala:119)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)