Paul Rogers created IMPALA-8026:
-----------------------------------

             Summary: Actual row counts for nested loop join are meaningless
                 Key: IMPALA-8026
                 URL: https://issues.apache.org/jira/browse/IMPALA-8026
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers


Consider this extract from a query plan:

{noformat}
Operator                      #Rows  Est. #Rows
--------------------------------------------------------------
…
|  10:HASH JOIN               9.53M      18.14K 
|  |--19:EXCHANGE                 1           1
|  |  00:SCAN HDFS                1           1
|  06:NESTED LOOP JOIN        4.88B     863.84K 
|  |--18:EXCHANGE                 1           1
|  |  04:SCAN HDFS                1           1
|  05:HASH JOIN               9.53M     863.84K
{noformat}

If the above is to be believed, the 06 nested loop join produced 5 billion 
rows. But, the actual number is far too huge for that: joining 1 row with 10 
million rows cannot produce 500 times that number of rows.

It appears that the nested loop join actually processed and returned the 9.5 
million rows, since that is the same number produced by the 10 hash join which 
joins a single row with the output of the nested loop join.

Because this same bogus result appears across multiple plans, it is likely that 
the actual number is completely wrong and bears no relation to the number of 
rows actually returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to