[
https://issues.apache.org/jira/browse/HIVE-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697577#action_12697577
]
Adam Kramer commented on HIVE-362:
----------------------------------
This is not a problem anymore.
Just to be a bit opinionated for a few moments, though, I do believe the
standards to be wrong on this issue; NULL values are an excellent way to force
scientists to really think about the query they're running, and implicitly
removing them will generally lead to harder-to-debug errors and more wasted
time than having to call the "remove nulls" version, call it avg_rn, explicitly.
> avg() ignores null values; consider variant that doesn't
> --------------------------------------------------------
>
> Key: HIVE-362
> URL: https://issues.apache.org/jira/browse/HIVE-362
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Adam Kramer
>
> Some of the current aggregates (sum, avg) have a fairly standard behavior: If
> any item in the list is NULL, the sum, average, etc., cannot be computed. And
> so, NULL is returned.
> 1) If this is the case, the query should return much faster--see a null,
> return NULL, exit(0).
> 2) It would be nice to have versions or ways to use these functions with NULL
> data--specifically, to treat the NULL as zero or to ignore the NULL and
> return the results for non-NULL data.
> This also would apply to the variance functions referenced in
> https://issues.apache.org/jira/browse/HIVE-165
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.