[
https://issues.apache.org/jira/browse/SPARK-55073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55073:
-----------------------------------
Labels: pull-request-available (was: )
> EvalPythonUDTFExec captures SparkPlan and sends it to executor
> --------------------------------------------------------------
>
> Key: SPARK-55073
> URL: https://issues.apache.org/jira/browse/SPARK-55073
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 4.0.0
> Reporter: Tim Armstrong
> Priority: Major
> Labels: pull-request-available
>
> The `semanticEquals` call here captures the SparkPlan and sends it to the
> executor.,
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonUDTFExec.scala
> I can easily fix this, I just need to hoist this logic out of the closure
> ```
> // flatten all the arguments
> val allInputs = new ArrayBuffer[Expression]
> val dataTypes = new ArrayBuffer[DataType]
> val argMetas = udtf.children.zip(
> udtf.tableArguments.getOrElse(Seq.fill(udtf.children.length)(false))
> ).map { case (e: Expression, isTableArg: Boolean) =>
> val (key, value) = e match {
> case NamedArgumentExpression(key, value) =>
> (Some(key), value)
> case _ =>
> (None, e)
> }
> if (allInputs.exists(_.semanticEquals(value))) {
> ArgumentMetadata(allInputs.indexWhere(_.semanticEquals(value)),
> key, isTableArg)
> } else {
> allInputs += value
> dataTypes += value.dataType
> ArgumentMetadata(allInputs.length - 1, key, isTableArg)
> }
> }.toArray
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]