[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879983#action_12879983 ]
Arvind Prabhakar commented on HIVE-287: --------------------------------------- @John: Thanks for reviewing this change. I have some follow-up comments and suggestions: bq. isDistinct: this doesn't actually modify the choice of evaluator implementation at all, since the actual duplicate elimination takes place upstream of the UDAF invocation. So instead of adding this parameter, can we instead add a new method supportsDistinct() on GenericUDAFEvaluator? While the evaluation may be happening upstream, I was concerned that it does not exclude the cases where this information is relevant to the function invocation itself. For example, the implementation of {{count}} requires that if there is a valid argument list, it must be qualified with {{DISTINCT}}. bq. isAllColumns: COUNT is probably the only function which is ever even going to care about this one. Couldn't we just use an empty array of TypeInfo to indicate all columns? I had a similar idea, but after some consideration opted for a simpler design. I felt that overloading arguments to indicate special cases might lead to confusion and eventual problem when a use-case emerges that invalidates this assumption. I do agree with your point that it will be good to stay compatible if possible. One way to do it would be as follows: # Revert the {{GenericUDAFResolver}} to its previous state but make the interface deprecated in favor of the abstract base class. # Push the newly introduced method into {{AbstractGenericUDAFResolver}} implementation. # Modify {{FunctionRegistry.getGenericUDAFEvaluator()}} method to test the resolver instance to be type compatible with {{AbstractGenericUDAFResolver}} and if so, invoke the new method. Otherwise revert to the old mechanism. What do you think about this approach? > count distinct on multiple columns does not work > ------------------------------------------------ > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Namit Jain > Assignee: Arvind Prabhakar > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.