[ https://issues.apache.org/jira/browse/SPARK-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-9258: ------------------------------- Description: We have 4 semi join operators. In this case, they are not very very necessary. We can still use an equi-join operator to do the join, and just not include any values from the other join. We waste a little bit space due to building a hash map rather than a hash set, but at the end of the day unless we are going to spend a lot of time optimizing hash set, our Tungsten hash map will be a lot more efficient than the hash set anyway. This way, semi-join automatically benefits from all the work we do in Tungsten. was: We have too many join operators than our resources to optimize them. In this case, BroadcastLeftSemiJoinHash isn't very necessary. We can still use an equi-join operator to do the join, and just not include any values from the other join. We waste a little bit space due to building a hash map rather than a hash set, but at the end of the day unless we are going to spend a lot of time optimizing hash set, our Tungsten hash map will be a lot more efficient than the hash set anyway ... > Remove all semi join physical operator > -------------------------------------- > > Key: SPARK-9258 > URL: https://issues.apache.org/jira/browse/SPARK-9258 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Reynold Xin > > We have 4 semi join operators. In this case, they are not very very > necessary. We can still use an equi-join operator to do the join, and just > not include any values from the other join. > We waste a little bit space due to building a hash map rather than a hash > set, but at the end of the day unless we are going to spend a lot of time > optimizing hash set, our Tungsten hash map will be a lot more efficient than > the hash set anyway. This way, semi-join automatically benefits from all the > work we do in Tungsten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org