[ https://issues.apache.org/jira/browse/SPARK-23124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liupengcheng updated SPARK-23124: --------------------------------- Description: When running a SparkSQL thritserver, we encountered sudden corruption of the thritserver which is caused by OutOfMemoryError. After review the code and some debug, I finally find out that the framework permit broadcast big table and give no warnings, detail code see below: {code:java} case logical.Join(left, right, joinType, condition) => val buildSide = broadcastSide(canBuildLeft = true, canBuildRight = true, left, right) // This join could be very slow or OOM joins.BroadcastNestedLoopJoinExec( planLater(left), planLater(right), buildSide, joinType, condition) :: Nil private def broadcastSide( canBuildLeft: Boolean, canBuildRight: Boolean, left: LogicalPlan, right: LogicalPlan): BuildSide = { def smallerSide = if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else BuildLeft val buildRight = canBuildRight && right.stats.hints.broadcast val buildLeft = canBuildLeft && left.stats.hints.broadcast if (buildRight && buildLeft) { // Broadcast smaller side base on its estimated physical size // if both sides have broadcast hint smallerSide } else if (buildRight) { BuildRight } else if (buildLeft) { BuildLeft } else if (canBuildRight && canBuildLeft) { // for the last default broadcast nested loop join smallerSide } else { throw new AnalysisException("Can not decide which side to broadcast for this join") } } {code} > Warn users when broacast big table in JoinSelection instead of just run it > -------------------------------------------------------------------------- > > Key: SPARK-23124 > URL: https://issues.apache.org/jira/browse/SPARK-23124 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0, 2.3.0 > Reporter: liupengcheng > Priority: Major > > When running a SparkSQL thritserver, we encountered sudden corruption of the > thritserver which is caused by OutOfMemoryError. > After review the code and some debug, I finally find out that the framework > permit broadcast big table and give no warnings, detail code see below: > {code:java} > case logical.Join(left, right, joinType, condition) => > val buildSide = broadcastSide(canBuildLeft = true, canBuildRight = true, > left, right) > // This join could be very slow or OOM > joins.BroadcastNestedLoopJoinExec( > planLater(left), planLater(right), buildSide, joinType, condition) :: Nil > private def broadcastSide( > canBuildLeft: Boolean, > canBuildRight: Boolean, > left: LogicalPlan, > right: LogicalPlan): BuildSide = { > def smallerSide = > if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else > BuildLeft > val buildRight = canBuildRight && right.stats.hints.broadcast > val buildLeft = canBuildLeft && left.stats.hints.broadcast > if (buildRight && buildLeft) { > // Broadcast smaller side base on its estimated physical size > // if both sides have broadcast hint > smallerSide > } else if (buildRight) { > BuildRight > } else if (buildLeft) { > BuildLeft > } else if (canBuildRight && canBuildLeft) { > // for the last default broadcast nested loop join > smallerSide > } else { > throw new AnalysisException("Can not decide which side to broadcast for > this join") > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org