Ke Jia created SPARK-28046: ------------------------------ Summary: OOM caused by building hash table when the compressed ratio of small table is normal Key: SPARK-28046 URL: https://issues.apache.org/jira/browse/SPARK-28046 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.1 Reporter: Ke Jia
Currently, spark will convert the sort merge join to broadcast hash join when the small table compressed size <= the broadcast threshold. Same with Spark, AE also convert the smj to bhj based on the compressed size in runtime. In our test, when enable ae with 32M broadcast threshold, one smj with 16M compressed size is converted to bhj. However, when building the hash table, the 16M small table is decompressed with 2GB size and has 134485048 row count, which has a mount of continuous and repeated values. Therefore, the following OOM exception occurs when building hash table: !image-2019-06-14-10-29-00-499.png! And based on this founding , it may be not reasonable to decide whether smj be converted to bhj only by the compressed size. In AE, we add the condition with the estimation decompressed size based on the row counts. And in spark, we may also need the decompressed size or row counts condition judgment not only the compressed size when converting the smj to bhj. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org