cocopc opened a new issue #1736:
URL: https://github.com/apache/hudi/issues/1736


   Env:
   Hive 2.1.1
   Hudi: 0.5.2
   Spark: 2.4.5
   
   MOR table and upsert  operation , when  query with spark-sql get the right 
result, but query with hive-on-mr get the wrong result.  
   My Table Info:
   Table Name: user
   Recored Key: distinct_id 
   
   SQL  : select distinct_id ,count(1) from user group by distinct_id order by 
distinct_id desc limit 10
   Query with Spark ,result is right. 
   +-----------+--------+
   |distinct_id|count(1)|
   +-----------+--------+
   |   51819928|       1|
   |   51819908|       1|
   |   51819791|       1|
   |   51819580|       1|
   |   51819136|       1|
   |   51819001|       1|
   |   51818734|       1|
   |   51818645|       1|
   |   51818417|       1|
   |   51818329|       1|
   +-----------+--------+
   
   Query with hive:  result is wrong, the count value should be  1 for each 
distinct_id ,because the distinct_id is record key , upsert shoud be merge. 
   +--------------+-----+--+
   | distinct_id  | c1  |
   +--------------+-----+--+
   | 51819928     | 8   |
   | 51819908     | 22  |
   | 51819791     | 7   |
   | 51819580     | 11  |
   | 51819136     | 9   |
   | 51819001     | 24  |
   | 51818734     | 9   |
   | 51818645     | 23  |
   | 51818417     | 22  |
   | 51818329     | 26  |
   
   Query with hive:  select * from user where distinct_id='51819928' ;
   the query result  only one row, it is right.   so strange!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to