[
https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-2750:
------------------------------
Attachment: HIVE-2750.D1455.1.patch
kevinwilfong requested code review of "HIVE-2750 [jira] Hive multi group by
single reducer optimization causes invalid column reference error".
Reviewers: JIRA
When generating the list of value columns for the reduce sink operator, in
the case of multiple group bys occurring in the same reducer, only the columns
used by the first query block was being considered, due to a typo. This patch
fixes this typo, and adds a testcase to ensure the error does not reoccur.
After the optimization, if two query blocks have the same distinct clause and
the same group by keys, but the first query block does not reference all the
rows the second query block does, an invalid column reference error is raised
for the columns unreferenced in the first query block.
E.g.
FROM src
INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT
src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT
src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY
substr(src.key,1,1);
This results in an invalid column reference error on src.value
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D1455
AFFECTED FILES
ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/3015/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> Hive multi group by single reducer optimization causes invalid column
> reference error
> -------------------------------------------------------------------------------------
>
> Key: HIVE-2750
> URL: https://issues.apache.org/jira/browse/HIVE-2750
> Project: Hive
> Issue Type: Bug
> Reporter: Kevin Wilfong
> Assignee: Kevin Wilfong
> Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and
> the same group by keys, but the first query block does not reference all the
> rows the second query block does, an invalid column reference error is raised
> for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT
> src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT
> src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY
> substr(src.key,1,1);
> This results in an invalid column reference error on src.value
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira