Jianfeng Hu created SPARK-14171:
-----------------------------------

             Summary: UDAF aggregates argument object inspector not parsed 
correctly
                 Key: SPARK-14171
                 URL: https://issues.apache.org/jira/browse/SPARK-14171
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.1
            Reporter: Jianfeng Hu
            Priority: Blocker


For example, when using percentile_approx and count distinct together, it 
raises an error complaining the argument is not constant. We have a test case 
to reproduce. Could you help look into a fix of this? This was working in 
previous version (Spark 1.4 + Hive 0.13). Thanks!

```--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
@@ -148,6 +148,9 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton 
with SQLTestUtils {

     checkAnswer(sql("SELECT percentile_approx(100.0, array(0.9, 0.9)) FROM src 
LIMIT 1"),
       sql("SELECT array(100, 100) FROM src LIMIT 1").collect().toSeq)
+
+    checkAnswer(sql("SELECT percentile_approx(key, 0.99999), count(distinct 
key) FROM src LIMIT 1"),
+      sql("SELECT max(key), 1 FROM src LIMIT 1").collect().toSeq)
    }

   test("UDFIntegerToString") {```


When running the test suite, we can see this error:

```
- Generic UDAF aggregates *** FAILED ***
  org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
tree: 
hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox@6e1dc6a7),key#51176,0.99999,false,0,0)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
  at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:357)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:238)
  at 
org.apache.spark.sql.catalyst.analysis.DistinctAggregationRewriter.org$apache$spark$sql$catalyst$analysis$DistinctAggregationRewriter$$patchAggregateFunctionChildren$1(DistinctAggregationRewriter.scala:148)
  at 
org.apache.spark.sql.catalyst.analysis.DistinctAggregationRewriter$$anonfun$15.apply(DistinctAggregationRewriter.scala:192)
  at 
org.apache.spark.sql.catalyst.analysis.DistinctAggregationRewriter$$anonfun$15.apply(DistinctAggregationRewriter.scala:190)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  ...
  Cause: java.lang.reflect.InvocationTargetException:
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1$$anonfun$apply$12.apply(TreeNode.scala:368)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1$$anonfun$apply$12.apply(TreeNode.scala:367)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:365)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:357)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
  ...
  Cause: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The second 
argument must be a constant, but double was passed instead.
  at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox.getEvaluator(GenericUDAFPercentileApprox.java:147)
  at 
org.apache.spark.sql.hive.HiveUDAFFunction.functionAndInspector$lzycompute(hiveUDFs.scala:598)
  at 
org.apache.spark.sql.hive.HiveUDAFFunction.functionAndInspector(hiveUDFs.scala:596)
  at 
org.apache.spark.sql.hive.HiveUDAFFunction.returnInspector$lzycompute(hiveUDFs.scala:606)
  at 
org.apache.spark.sql.hive.HiveUDAFFunction.returnInspector(hiveUDFs.scala:606)
  at org.apache.spark.sql.hive.HiveUDAFFunction.<init>(hiveUDFs.scala:654)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  ...
```




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to