Parth Brahmbhatt created SPARK-17247:
----------------------------------------

             Summary: when fall back to hdfs is enabled for stats calculation, 
the hdfs listing and size calcuation should be terminated as soon as total size 
> broadcast threshold
                 Key: SPARK-17247
                 URL: https://issues.apache.org/jira/browse/SPARK-17247
             Project: Spark
          Issue Type: Bug
            Reporter: Parth Brahmbhatt


Currently when user enables spark.sql.statistics.fallBackToHdfs and no stats 
are available from metastore we fall back to hdfs. This is useful join 
optimization however this can slow things down. To speed up the operation we 
could stop size calculation as soon as we hit the broadcast threshold as the 
accuracy of size is not important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to