[
https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710435#comment-15710435
]
Xuefu Zhang edited comment on HIVE-15239 at 12/1/16 1:12 AM:
-------------------------------------------------------------
Sorry for the delay.
Re: my point #1, I was referring to this:
{code}
Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
if (firstRootOperators.size() != secondRootOperators.size()) {
return false;
}
// need to check paths and partition desc for MapWorks
if (first instanceof MapWork && !compareMapWork((MapWork) first,
(MapWork) second)) {
return false;
}
{code}
I think it's better to be like the following in order to put logical unit of
code (for operator check) together.
{code}
// need to check paths and partition desc for MapWorks
if (first instanceof MapWork && !compareMapWork((MapWork) first,
(MapWork) second)) {
return false;
}
Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
if (firstRootOperators.size() != secondRootOperators.size()) {
return false;
}
{code}
As to exhaustive check, your fix will solve the problem describe here. I would
even believe there is a possibility that there are two two mapwork that works
on different partitions of the same table, such as in case of union.
Overall, I feel more testing is needed for this feature. Of course this goes
beyond the scope of this JIRA.
was (Author: xuefuz):
Sorry for the delay.
Re: my point #1, I was referring to this:
{code}
Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
if (firstRootOperators.size() != secondRootOperators.size()) {
return false;
}
// need to check paths and partition desc for MapWorks
if (first instanceof MapWork && !compareMapWork((MapWork) first,
(MapWork) second)) {
return false;
}
{code}
I think it's better to be like the following in order to logical unit of code
together.
{code}
// need to check paths and partition desc for MapWorks
if (first instanceof MapWork && !compareMapWork((MapWork) first,
(MapWork) second)) {
return false;
}
Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
if (firstRootOperators.size() != secondRootOperators.size()) {
return false;
}
{code}
As to exhaustive check, your fix will solve the problem describe here. I would
even believe there is a possibility that there are two two mapwork that works
on different partitions of the same table, such as in case of union.
Overall, I feel more testing is needed for this feature. Of course this goes
beyond the scope of this JIRA.
> hive on spark combine equivalentwork get wrong result because of tablescan
> operation compare
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-15239
> URL: https://issues.apache.org/jira/browse/HIVE-15239
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Affects Versions: 1.2.0, 2.1.0
> Reporter: wangwenli
> Assignee: Rui Li
> Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch
>
>
> env: hive on spark engine
> reproduce step:
> {code}
> create table a1(KEHHAO string, START_DT string) partitioned by (END_DT
> string);
> create table a2(KEHHAO string, START_DT string) partitioned by (END_DT
> string);
> alter table a1 add partition(END_DT='20161020');
> alter table a1 add partition(END_DT='20161021');
> insert into table a1 partition(END_DT='20161020')
> values('2000721360','20161001');
> SELECT T1.KEHHAO,COUNT(1) FROM (
> SELECT KEHHAO FROM a1 T
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND
> T.END_DT-1
> UNION ALL
> SELECT KEHHAO FROM a2 T
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND
> T.END_DT-1
> ) T1
> GROUP BY T1.KEHHAO
> HAVING COUNT(1)>1;
> +-------------+------+--+
> | t1.kehhao | _c1 |
> +-------------+------+--+
> | 2000721360 | 2 |
> +-------------+------+--+
> {code}
> the result should be none record
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)