[ 
https://issues.apache.org/jira/browse/SPARK-22334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22334:
------------------------------------

    Assignee:     (was: Apache Spark)

> Check table size from HDFS in case the size in metastore is wrong.
> ------------------------------------------------------------------
>
>                 Key: SPARK-22334
>                 URL: https://issues.apache.org/jira/browse/SPARK-22334
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: jin xing
>
> Currently we use table properties('totalSize') to judge if to use broadcast 
> join. Table properties are from metastore. However they can be wrong. Hive 
> sometimes fails to update table properties after producing data 
> successfully(e,g, NameNode timeout from 
> https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L180).
>  If 'totalSize' in table properties is much smaller than its real size on 
> HDFS, Spark can launch broadcast join by mistake and suffer OOM.
> Could we add a defense config and check the size from HDFS when 'totalSize' 
> is below {{spark.sql.autoBroadcastJoinThreshold}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to