Anand Mohan Tumuluri created SPARK-3891:
-------------------------------------------

             Summary: Support Hive Percentile UDAF with array of percentile 
values
                 Key: SPARK-3891
                 URL: https://issues.apache.org/jira/browse/SPARK-3891
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.0
         Environment: Spark 1.2.0 trunk 
(ac302052870a650d56f2d3131c27755bb2960ad7) on
CDH 5.1.0
Centos 6.5
8x 2GHz, 24GB RAM
            Reporter: Anand Mohan Tumuluri


Spark PR 2620 brings in the support of Hive percentile UDAF.
However Hive percentile and percentile_approx UDAFs also support returning an 
array of percentile values with the syntax
percentile(BIGINT col, array(p1 [, p2]...)) or 
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])

These queries are failing with the below error:

0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name, 
percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in stage 
25.0 (TID 305, Dev-uuppala.sfohi.philips.com): java.lang.ClassCastException: 
scala.collection.mutable.ArrayBuffer cannot be cast to [Ljava.lang.Object;
        
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
        
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
        
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
        
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
        org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
        
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
        
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
        org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        org.apache.spark.scheduler.Task.run(Task.scala:56)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to