Parth Brahmbhatt created SPARK-17247:
----------------------------------------
Summary: when fall back to hdfs is enabled for stats calculation,
the hdfs listing and size calcuation should be terminated as soon as total size
> broadcast threshold
Key: SPARK-17247
URL: https://issues.apache.org/jira/browse/SPARK-17247
Project: Spark
Issue Type: Bug
Reporter: Parth Brahmbhatt
Currently when user enables spark.sql.statistics.fallBackToHdfs and no stats
are available from metastore we fall back to hdfs. This is useful join
optimization however this can slow things down. To speed up the operation we
could stop size calculation as soon as we hit the broadcast threshold as the
accuracy of size is not important.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]