[
https://issues.apache.org/jira/browse/SPARK-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust resolved SPARK-3891.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.3.0
Issue resolved by pull request 2802
[https://github.com/apache/spark/pull/2802]
> Support Hive Percentile UDAF with array of percentile values
> ------------------------------------------------------------
>
> Key: SPARK-3891
> URL: https://issues.apache.org/jira/browse/SPARK-3891
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.0
> Environment: Spark 1.2.0 trunk
> (ac302052870a650d56f2d3131c27755bb2960ad7) on
> CDH 5.1.0
> Centos 6.5
> 8x 2GHz, 24GB RAM
> Reporter: Anand Mohan Tumuluri
> Assignee: Venkata Ramana G
> Fix For: 1.3.0
>
>
> Spark PR 2620 brings in the support of Hive percentile UDAF.
> However Hive percentile and percentile_approx UDAFs also support returning an
> array of percentile values with the syntax
> percentile(BIGINT col, array(p1 [, p2]...)) or
> percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])
> These queries are failing with the below error:
> 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name,
> percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in
> stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com):
> java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be
> cast to [Ljava.lang.Object;
>
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
>
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
>
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
>
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
> org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
>
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
>
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]