Anand Mohan Tumuluri created SPARK-3891:
-------------------------------------------
Summary: Support Hive Percentile UDAF with array of percentile
values
Key: SPARK-3891
URL: https://issues.apache.org/jira/browse/SPARK-3891
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.2.0
Environment: Spark 1.2.0 trunk
(ac302052870a650d56f2d3131c27755bb2960ad7) on
CDH 5.1.0
Centos 6.5
8x 2GHz, 24GB RAM
Reporter: Anand Mohan Tumuluri
Spark PR 2620 brings in the support of Hive percentile UDAF.
However Hive percentile and percentile_approx UDAFs also support returning an
array of percentile values with the syntax
percentile(BIGINT col, array(p1 [, p2]...)) or
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])
These queries are failing with the below error:
0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name,
percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task
1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in stage
25.0 (TID 305, Dev-uuppala.sfohi.philips.com): java.lang.ClassCastException:
scala.collection.mutable.ArrayBuffer cannot be cast to [Ljava.lang.Object;
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace: (state=,code=0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]