Davies Liu created SPARK-15392:
----------------------------------
Summary: The default value of size estimation is not good
Key: SPARK-15392
URL: https://issues.apache.org/jira/browse/SPARK-15392
Project: Spark
Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Davies Liu
We use autoBroadcastJoinThreshold + 1L as the default value of size
estimation, that is not good in 2.0, because we will calculate the size based
on size of schema, then the estimation could be less than
autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame created
from RDD.
We should use an even bigger default value, for example, MaxLong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]