jin xing created SPARK-22334:
--------------------------------

             Summary: Check table size from Hdfs in case the size in metastore 
is wrong.
                 Key: SPARK-22334
                 URL: https://issues.apache.org/jira/browse/SPARK-22334
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: jin xing


Currently we use table properties('totalSize') to judge if to use broadcast 
join. Table properties are from metastore. However they can be wrong. Hive 
sometimes fails to update table properties after producing data 
successfully(e,g, NameNode timeout from 
https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L180).
 If 'totalSize' in table properties is much smaller than its real size on HDFS, 
Spark can launch broadcast join by mistake and suffer OOM.
Could we add a defense config and check the size from HDFS when 'totalSize' is 
below {{spark.sql.autoBroadcastJoinThreshold}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to