[
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879983#action_12879983
]
Arvind Prabhakar commented on HIVE-287:
---------------------------------------
@John: Thanks for reviewing this change. I have some follow-up comments and
suggestions:
bq. isDistinct: this doesn't actually modify the choice of evaluator
implementation at all, since the actual duplicate elimination takes place
upstream of the UDAF invocation. So instead of adding this parameter, can we
instead add a new method supportsDistinct() on GenericUDAFEvaluator?
While the evaluation may be happening upstream, I was concerned that it does
not exclude the cases where this information is relevant to the function
invocation itself. For example, the implementation of {{count}} requires that
if there is a valid argument list, it must be qualified with {{DISTINCT}}.
bq. isAllColumns: COUNT is probably the only function which is ever even going
to care about this one. Couldn't we just use an empty array of TypeInfo to
indicate all columns?
I had a similar idea, but after some consideration opted for a simpler design.
I felt that overloading arguments to indicate special cases might lead to
confusion and eventual problem when a use-case emerges that invalidates this
assumption.
I do agree with your point that it will be good to stay compatible if possible.
One way to do it would be as follows:
# Revert the {{GenericUDAFResolver}} to its previous state but make the
interface deprecated in favor of the abstract base class.
# Push the newly introduced method into {{AbstractGenericUDAFResolver}}
implementation.
# Modify {{FunctionRegistry.getGenericUDAFEvaluator()}} method to test the
resolver instance to be type compatible with {{AbstractGenericUDAFResolver}}
and if so, invoke the new method. Otherwise revert to the old mechanism.
What do you think about this approach?
> count distinct on multiple columns does not work
> ------------------------------------------------
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.