Zoltan Haindrich created HIVE-24376: ---------------------------------------
Summary: SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode Key: HIVE-24376 URL: https://issues.apache.org/jira/browse/HIVE-24376 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich the mode name is also a bit confusing..but here is what happens: {code} TS[A1] -> ... TS[A2] -> JOIN TS[B] -> JOIN {code} we have an SJ edge between TS[B] -> TS[A2] to communicate informations about the join keys; lets assume the reducation ratio was r. RemoveSemijoin right now does the following: * removes the semijoin edge (so TS[A2] will become a full scan) * merges TS[A1] and TS[A2] w.r.t to read data from disk: this is great - we accessed A twice; from which 1 was a full scan - and now we only read it once. but from row traffic perspective: TS[A2] emits more rows from now on because we dont have the r ratio semijoin reduction anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)