[jira] [Updated] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

Gengliang Wang (Jira) Thu, 12 May 2022 08:07:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gengliang Wang updated SPARK-39166:
-----------------------------------
    Description: 
Currently, for most of the cases, the project 
https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the 
runtime errors happen within the original query.
However, after trying on production, I found that the following queries won't 
show where the divide by 0 error happens


{code:java}
create table aggTest(i int, j int, k int, d date) using parquet
insert into aggTest values(1, 2, 0, date'2022-01-01')
select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}

With `percentile` function in the query, the plan can't execute with whole 
stage codegen. Thus the child plan of `Project` is serialized to executors for 
execution, from ProjectExec:


{code:java}
  protected override def doExecute(): RDD[InternalRow] = {
    child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
      val project = UnsafeProjection.create(projectList, child.output)
      project.initialize(index)
      iter.map(project)
    }
  }{code}
Note that the `TreeNode.origin` is not serialized to executors since `TreeNode` 
doesn't extend the trait `Serializable`, which results in an empty query 
context on errors. For more details, please read 
https://issues.apache.org/jira/browse/SPARK-39140

A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, it 
can be performance regression if the query text is long (every `TreeNode` 
carries it for serialization). 
A better fix is to introduce a new trait `SupportQueryContext` and materialize 
the truncated query context for special expressions. This jira targets on 
binary arithmetic expressions only. I will create follow-ups for the remaining 
expressions which support runtime error query context.

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --------------------------------------------------------------------------
>
>                 Key: SPARK-39166
>                 URL: https://issues.apache.org/jira/browse/SPARK-39166
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>
> Currently, for most of the cases, the project 
> https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the 
> runtime errors happen within the original query.
> However, after trying on production, I found that the following queries won't 
> show where the divide by 0 error happens
> {code:java}
> create table aggTest(i int, j int, k int, d date) using parquet
> insert into aggTest values(1, 2, 0, date'2022-01-01')
> select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}
> With `percentile` function in the query, the plan can't execute with whole 
> stage codegen. Thus the child plan of `Project` is serialized to executors 
> for execution, from ProjectExec:
> {code:java}
>   protected override def doExecute(): RDD[InternalRow] = {
>     child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
>       val project = UnsafeProjection.create(projectList, child.output)
>       project.initialize(index)
>       iter.map(project)
>     }
>   }{code}
> Note that the `TreeNode.origin` is not serialized to executors since 
> `TreeNode` doesn't extend the trait `Serializable`, which results in an empty 
> query context on errors. For more details, please read 
> https://issues.apache.org/jira/browse/SPARK-39140
> A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, 
> it can be performance regression if the query text is long (every `TreeNode` 
> carries it for serialization). 
> A better fix is to introduce a new trait `SupportQueryContext` and 
> materialize the truncated query context for special expressions. This jira 
> targets on binary arithmetic expressions only. I will create follow-ups for 
> the remaining expressions which support runtime error query context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

Reply via email to