[ 
https://issues.apache.org/jira/browse/SPARK-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171019#comment-14171019
 ] 

Venkata Ramana G commented on SPARK-3891:
-----------------------------------------

Following problems need to be fixed to passing array to percentile and 
percentile_approx UDAFs
1. percentile UDAF the parameters are not wrapped before passing to UDAF
2. percentile_approx takes only constant inspector as parameter, so constant 
inspectors support needs to be added to GenericUDAF.

> Support Hive Percentile UDAF with array of percentile values
> ------------------------------------------------------------
>
>                 Key: SPARK-3891
>                 URL: https://issues.apache.org/jira/browse/SPARK-3891
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>         Environment: Spark 1.2.0 trunk 
> (ac302052870a650d56f2d3131c27755bb2960ad7) on
> CDH 5.1.0
> Centos 6.5
> 8x 2GHz, 24GB RAM
>            Reporter: Anand Mohan Tumuluri
>            Assignee: Venkata Ramana G
>
> Spark PR 2620 brings in the support of Hive percentile UDAF.
> However Hive percentile and percentile_approx UDAFs also support returning an 
> array of percentile values with the syntax
> percentile(BIGINT col, array(p1 [, p2]...)) or 
> percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])
> These queries are failing with the below error:
> 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name, 
> percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com): 
> java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be 
> cast to [Ljava.lang.Object;
>         
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
>         
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
>         
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
>         
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
>         org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
>         
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
>         
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>         
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         org.apache.spark.scheduler.Task.run(Task.scala:56)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to