gumengchao created HIVE-23304:
---------------------------------
Summary: Two SQL execution results with the same semantics are not
the same
Key: HIVE-23304
URL: https://issues.apache.org/jira/browse/HIVE-23304
Project: Hive
Issue Type: Bug
Components: Hive, SQL
Affects Versions: 2.3.7
Environment: hadoop version 2.7.7
hive version 2.3.7
hive.execution.engine=mr
Reporter: gumengchao
execute sql
{noformat}
SELECT AVG(a.rate), a.movieid, MAX(b.userid)
FROM t_rating a
INNER JOIN (
SELECT a.movieid AS movieid, a.userid AS userid, a.rate
FROM t_rating a
JOIN (
SELECT COUNT(*) AS rate_times, a.userid AS
userid
FROM t_rating a
JOIN (
SELECT *
FROM t_user
WHERE sex = 'F'
) b
ON a.userid = b.userid
GROUP BY a.userid
ORDER BY rate_times DESC
LIMIT 0, 1
) b
ON a.userid = b.userid
ORDER BY a.rate DESC
LIMIT 0, 10
) b
ON a.movieid = b.movieid
GROUP BY a.movieid{noformat}
The result is below.But this is not the result I want
{noformat}
4.063136456211812 162 1150
4.476190476190476 904 1150
4.249370277078086 951 1150
4.14167916041979 1230 1150
3.6464646464646466 1966 1150
4.163043478260869 2330 1150
3.7039473684210527 3163 1150
4.387453874538745 3307 1150
4.047363717605005 3671 1150
3.8265682656826567 3675 1150{noformat}
So I divided sql into two steps
fisrt:
{noformat}
create temporary table tmp as
SELECT a.movieid AS movieid, a.userid AS userid, a.rate
FROM t_rating a
JOIN (
SELECT COUNT(*) AS rate_times, a.userid AS
userid
FROM t_rating a
JOIN (
SELECT *
FROM t_user
WHERE sex = 'F'
) b
ON a.userid = b.userid
GROUP BY a.userid
ORDER BY rate_times DESC
LIMIT 0, 1
) b
ON a.userid = b.userid
ORDER BY a.rate DESC
LIMIT 0, 10{noformat}
second:
{noformat}
SELECT AVG(a.rate), a.movieid, MAX(b.userid)
FROM t_rating a INNER JOIN tmp b
ON a.movieid = b.movieid
GROUP BY a.movieid{noformat}
the result
{noformat}
4.52054794520548 745 1150
4.4498902706656915 750 1150
4.476190476190476 904 1150
4.280748663101604 905 1150
3.7314890154597236 1094 1150
4.188888888888889 1236 1150
4.21043771043771 1256 1150
3.747422680412371 1279 1150
4.0739348370927315 2064 1150
4.125390450691656 2997 1150{noformat}
Why the results of two executions are different?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)