Parth Gandhi created SPARK-24935:
------------------------------------
Summary: Problem with Executing Hive UDF's from Spark 2.2 Onwards
Key: SPARK-24935
URL: https://issues.apache.org/jira/browse/SPARK-24935
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.1, 2.2.0
Reporter: Parth Gandhi
A user of sketches library(https://github.com/DataSketches/sketches-hive)
reported an issue with HLL Sketch Hive UDAF that seems to be a bug in Spark or
Hive. Their code runs fine in 2.1 but has an issue from 2.2 onwards. For more
details on the issue, you can refer to the discussion in the sketches-user list:
[https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ]
On further debugging, we figured out that from 2.2 onwards, Spark hive UDAF
provides support for partial aggregation, and has removed the functionality
that supported complete mode aggregation(Refer
https://issues.apache.org/jira/browse/SPARK-19060 and
https://issues.apache.org/jira/browse/SPARK-18186). Thus, instead of expecting
update method to be called, merge method is called here
([https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56)]
which throws the exception as described in the forums above.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]