[
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716057#action_12716057
]
Zheng Shao commented on HIVE-474:
---------------------------------
There are several approaches to solve this problem:
A1: separate group-by and join the results.
{code}
SELECT COALESCE(t1.key, t2.key), COALESCE(d_a, 0), , COALESCE(d_b, 0)
FROM
(SELECT key, count(distinct a) as d_a ...) t1
OUTER JOIN
(SELECT key, count(distinct b) as d_b ...) t2
ON t1.key = t2.key
{code}
A2: Take advantage of union type (HIVE-537).
See HIVE-537 for details.
A3: Take advantage of partitioned merge join:
Here is a different plan. It depends on partitioned merge join.
Also the 2 jobs have to have the same
{code}
Query:
select a, count(distinct b), count(distinct c), sum(d)
Plan:
Job1:
Map side:
Emit: distribution_key: a, sort_key: a, b, value: d
Save a, c to temp_file1
Reduce side:
Group By:
a, count(distinct b), sum(d)
Output: temp_file2
Job 2: Input: temp_file1
Map side:
Emit: distribution_key: a, sort_key: a, c, value: nothing
Reduce side:
Group By
a, count(distinct c)
Partitioned Merge Join with temp_file2 on a
a, count(distinct b), sum(d), count(distinct c)
{code}
> Support for distinct selection on two or more columns
> -----------------------------------------------------
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Alexis Rondeau
>
> The ability to select distinct several, individual columns as by example:
> select count(distinct user), count(distinct session) from actions;
> Currently returns the following failure:
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
> not Supported user
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.