Paul Rogers created IMPALA-8026: ----------------------------------- Summary: Actual row counts for nested loop join are meaningless Key: IMPALA-8026 URL: https://issues.apache.org/jira/browse/IMPALA-8026 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 3.1.0 Reporter: Paul Rogers
Consider this extract from a query plan: {noformat} Operator #Rows Est. #Rows -------------------------------------------------------------- … | 10:HASH JOIN 9.53M 18.14K | |--19:EXCHANGE 1 1 | | 00:SCAN HDFS 1 1 | 06:NESTED LOOP JOIN 4.88B 863.84K | |--18:EXCHANGE 1 1 | | 04:SCAN HDFS 1 1 | 05:HASH JOIN 9.53M 863.84K {noformat} If the above is to be believed, the 06 nested loop join produced 5 billion rows. But, the actual number is far too huge for that: joining 1 row with 10 million rows cannot produce 500 times that number of rows. It appears that the nested loop join actually processed and returned the 9.5 million rows, since that is the same number produced by the 10 hash join which joins a single row with the output of the nested loop join. Because this same bogus result appears across multiple plans, it is likely that the actual number is completely wrong and bears no relation to the number of rows actually returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org