[ 
https://issues.apache.org/jira/browse/SPARK-43385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yikaifei updated SPARK-43385:
-----------------------------
    Description: 
The Generator's statistics should be ratio times greater than the child nodes.

Generator is an expression that produces zero or more rows given a single input 
row.

If `UserDefinedGenerator` and `HiveUDTF` were used, the output could be N times 
that of the child node, resulting in a statistical error.

Because of incorrect statistics, Spark may select an incorrect execution plan. 
For example, if `BroadcastHashJoinExec` is selected, the Job fails to broadcast 
buildSide.

  was:The Generator's statistics should be ratio times greater than the child 
nodes.


> The Generator's statistics should be ratio times greater than the child nodes
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-43385
>                 URL: https://issues.apache.org/jira/browse/SPARK-43385
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: yikaifei
>            Priority: Minor
>             Fix For: 3.5.0
>
>
> The Generator's statistics should be ratio times greater than the child nodes.
> Generator is an expression that produces zero or more rows given a single 
> input row.
> If `UserDefinedGenerator` and `HiveUDTF` were used, the output could be N 
> times that of the child node, resulting in a statistical error.
> Because of incorrect statistics, Spark may select an incorrect execution 
> plan. For example, if `BroadcastHashJoinExec` is selected, the Job fails to 
> broadcast buildSide.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to