[ 
https://issues.apache.org/jira/browse/HIVE-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688777#action_12688777
 ] 

Adam Kramer commented on HIVE-362:
----------------------------------

Very sorry! After further testing, the actual problem I was having was  that 
SUM returns a double, and I was putting it into an INT container without an 
explicit cast.

So, maybe this bug is a request to change the wiki to make this clear.

Also...let's actually say the opposite. We really should have AVG or some 
version of it return NULL if any NULL values are found. This is the default 
behavior in R, and for good reason...I will link to the description of why once 
I find it.

Basically, the argument is that when a sum, average, or variance depends on n 
in some sense, comparisons among these groups cannot be trusted. For example, 
if there are 5 nulls in a set of 20 variables, SUM(x) / COUNT(x) returns a 
different value than AVG(x)...that is very unintuitive.


> avg() returns null if any item in the list is null; create function that 
> doesn't
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-362
>                 URL: https://issues.apache.org/jira/browse/HIVE-362
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Adam Kramer
>
> Some of the current aggregates (sum, avg) have a fairly standard behavior: If 
> any item in the list is NULL, the sum, average, etc., cannot be computed. And 
> so, NULL is returned.
> 1) If this is the case, the query should return much faster--see a null, 
> return NULL, exit(0).
> 2) It would be nice to have versions or ways to use these functions with NULL 
> data--specifically, to treat the NULL as zero or to ignore the NULL and 
> return the results for non-NULL data.
> This also would apply to the variance functions referenced in 
> https://issues.apache.org/jira/browse/HIVE-165

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to