[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879929#action_12879929
 ] 

John Sichi commented on HIVE-287:
---------------------------------

Sorry to chime in late on this one, but I have one big question on this one:  
can we instead do it in a way which does not break the UDAF interface?

The existing patch adds a new method to the GenericUDAFResolver interface, 
meaning all existing plugin implementations outside of the Hive codebase will 
fail to compile (due to the fact that we did not already have the insulating 
abstract base class available).  We already have some of these within Facebook.

Let's analyze the two new parameters one by one.

isDistinct:  this doesn't actually modify the choice of evaluator 
implementation at all, since the actual duplicate elimination takes place 
upstream of the UDAF invocation.  So instead of adding this parameter, can we 
instead add a new method supportsDistinct() on GenericUDAFEvaluator?  Then call 
this after instantiating the new evaluator in order to carry out the additional 
validation.

isAllColumns:  COUNT(*) is probably the only function which is ever even going 
to care about this one.  Couldn't we just use an empty array of TypeInfo to 
indicate all columns?

Independent of the above, I think adding the insulating abstract base should 
still be done now to make future transitions smoother when interface-breaking 
is absolutely required.  So keep that part of the patch.


> count distinct on multiple columns does not work
> ------------------------------------------------
>
>                 Key: HIVE-287
>                 URL: https://issues.apache.org/jira/browse/HIVE-287
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Arvind Prabhakar
>         Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to