[ https://issues.apache.org/jira/browse/SPARK-41509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-41509: ----------------------------------- Assignee: jiaan.geng > Delay execution hash until after aggregation for semi-join runtime filter. > -------------------------------------------------------------------------- > > Key: SPARK-41509 > URL: https://issues.apache.org/jira/browse/SPARK-41509 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: jiaan.geng > Assignee: jiaan.geng > Priority: Major > > Currently, Spark runtime filter supports bloom filter and in subquery filter. > The in subquery filter always execute Murmur3Hash before aggregate the join > key. > Because the data size before aggregate will lager than after, we can delay > execute Murmur3Hash until after aggregation for semi-join runtime filter and > it will reduce the number of calls to Murmur3Hash and improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org