[jira] [Comment Edited] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare

Xuefu Zhang (JIRA) Wed, 23 Nov 2016 16:26:31 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691764#comment-15691764
 ]


Xuefu Zhang edited comment on HIVE-15239 at 11/24/16 12:25 AM:
---------------------------------------------------------------

Patch looks good. A few minor comments:

1. The following code seems being inserted in the middle of other code block.
{code}
      // need to check paths and partition desc for MapWorks
      if (first instanceof MapWork && !compareMapWork((MapWork) first, 
(MapWork) second)) {
        return false;
      }
{code}
2. As a custom, null check and null equal check might be better in the compare 
method itself rather than letting the caller take the responsibility. This 
applies to the few private methods introduced, but no big deal though.
3. I'm not sure if it makes sense to put these compare() methods in the 
corresponding classes. Otherwise, these comparisons can be easily broken.

One concern I have is whether the comparisons are exhaustive. That is, whether 
the condition check is sufficient. With some many noisy fields in those 
compared classes, it's hard to see which are important and which are not. 

Thoughts?


was (Author: xuefuz):
Patch looks good. Two minor comments:

1. The following code seems being inserted in the middle of other code block.
{code}
      // need to check paths and partition desc for MapWorks
      if (first instanceof MapWork && !compareMapWork((MapWork) first, 
(MapWork) second)) {
        return false;
      }
{code}
2. As a custom, null check and null equal check might be better in the compare 
method itself rather than letting the caller take the responsibility. This 
applies to the few private methods introduced, but no big deal though.
3. I'm not sure if it makes sense to put these compare() methods in the 
corresponding classes. Otherwise, these comparisons can be easily broken.

One concern I have is whether the comparisons are exhaustive. That is, whether 
the condition check is sufficient. With some many noisy fields in those 
compared classes, it's hard to see which are important and which are not. 

Thoughts?

> hive on spark combine equivalentwork get wrong result because of  tablescan 
> operation compare
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15239
>                 URL: https://issues.apache.org/jira/browse/HIVE-15239
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.0, 2.1.0
>            Reporter: wangwenli
>            Assignee: Rui Li
>         Attachments: HIVE-15239.1.patch
>
>
> env: hive on spark engine
> reproduce step:
> {code}
> create table a1(KEHHAO string, START_DT string) partitioned by (END_DT 
> string);
> create table a2(KEHHAO string, START_DT string) partitioned by (END_DT 
> string);
> alter table a1 add partition(END_DT='20161020');
> alter table a1 add partition(END_DT='20161021');
> insert into table a1 partition(END_DT='20161020') 
> values('2000721360','20161001');
> SELECT T1.KEHHAO,COUNT(1) FROM ( 
> SELECT KEHHAO FROM a1 T 
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND 
> T.END_DT-1 
> UNION ALL 
> SELECT KEHHAO FROM a2 T
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND 
> T.END_DT-1 
> ) T1 
> GROUP BY T1.KEHHAO 
> HAVING COUNT(1)>1; 
> +-------------+------+--+
> |  t1.kehhao  | _c1  |
> +-------------+------+--+
> | 2000721360  | 2    |
> +-------------+------+--+
> {code}
> the result should be none record



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare

Reply via email to