[ 
https://issues.apache.org/jira/browse/SPARK-38981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527322#comment-17527322
 ] 

Maximilian Sackel commented on SPARK-38981:
-------------------------------------------

To fill the minimal working example with some more input, I'll try to motivate 
it.
In general, a function should be applied to a large table for a certain 
category type. 
Therefore the task is divided into 3 subtasks. 
a) For each row determine the category types using udf.
b) Filter rows by the searched category types
c) Calculate values for the category types using udf. If rows are used in the 
calculation which do not correspond to the category, an error is thrown during 
the calculation. 
d) the error terminates the whole process

Simply adding the rule 
"org.apache.spark.sql.catalyst.optimizer.PushDownPredicate" to the exclude 
rules does not seem to solve the problem. [~hyukjin.kwon]  it would be realy 
nice if you could refer me to the appropriate place in the documentation, where 
I can start testing. 

The basic Idea is to exclude the optimizer rules for the corresponding lines 
and then reactivate it, to make use of the optimizer algorithms again?

> Unexpected commutative property of udf/pandas_udf and filters
> -------------------------------------------------------------
>
>                 Key: SPARK-38981
>                 URL: https://issues.apache.org/jira/browse/SPARK-38981
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 3.2.1
>            Reporter: Maximilian Sackel
>            Priority: Critical
>              Labels: beginner
>         Attachments: optimization_udf_filter.html, screenshot-1.png, 
> screenshot-2.png
>
>
> Hello all,
> When running the attached minmal working example in the attachments, the 
> order of the filter and the UDF is swapped by the optimizer. This can lead to 
> errors, which are difficult to debug. In the documentation I have found no 
> reference to such behavior. 
> Is this a bug or a functionality which is poorly documented?
> With kind regards,
> Max



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to