[jira] [Commented] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results

Hyukjin Kwon (JIRA) Sun, 16 Apr 2017 02:04:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970296#comment-15970296
 ]


Hyukjin Kwon commented on SPARK-13680:
--------------------------------------

Could we narrow down the scope? It seems the query itself is too complex. It'd 
be great if this can be tested against the current master if possible.

What was the expected output and what was the actual output? I think I'd rather 
close this if the reporter is not active.

> Java UDAF with more than one intermediate argument returns wrong results
> ------------------------------------------------------------------------
>
>                 Key: SPARK-13680
>                 URL: https://issues.apache.org/jira/browse/SPARK-13680
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: CDH 5.5.2
>            Reporter: Yael Aharon
>         Attachments: data.csv, setup.hql
>
>
> I am trying to incorporate the Java UDAF from 
> https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java
>  into an SQL query. 
> I registered the UDAF like this:
>  sqlContext.udf().register("myavg", new MyDoubleAvg());
> My SQL query is:
> SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, 
> AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS 
> `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS 
> `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS 
> `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS 
> `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS 
> `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS 
> `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS 
> `sum_stdevi`, myavg(seqd) as `myavg_seqd`,          AVG(zero) AS `avg_zero`, 
> AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, 
> SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, 
> MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS 
> `count_all`, count(nulli) AS `count_nulli` FROM mytable
> As soon as I add the UDAF myavg to the SQL, all the results become incorrect. 
> When I remove the call to the UDAF, the results are correct.
> I was able to go around the issue by modifying bufferSchema of the UDAF to 
> use an array and the corresponding update and merge methods. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results

Reply via email to