[
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-39166:
-----------------------------------
Description:
Currently, for most of the cases, the project
https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the
runtime errors happen within the original query.
However, after trying on production, I found that the following queries won't
show where the divide by 0 error happens
{code:java}
create table aggTest(i int, j int, k int, d date) using parquet
insert into aggTest values(1, 2, 0, date'2022-01-01')
select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}
With `percentile` function in the query, the plan can't execute with whole
stage codegen. Thus the child plan of `Project` is serialized to executors for
execution, from ProjectExec:
{code:java}
protected override def doExecute(): RDD[InternalRow] = {
child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
val project = UnsafeProjection.create(projectList, child.output)
project.initialize(index)
iter.map(project)
}
}{code}
Note that the `TreeNode.origin` is not serialized to executors since `TreeNode`
doesn't extend the trait `Serializable`, which results in an empty query
context on errors. For more details, please read
https://issues.apache.org/jira/browse/SPARK-39140
A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, it
can be performance regression if the query text is long (every `TreeNode`
carries it for serialization).
A better fix is to introduce a new trait `SupportQueryContext` and materialize
the truncated query context for special expressions. This jira targets on
binary arithmetic expressions only. I will create follow-ups for the remaining
expressions which support runtime error query context.
> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --------------------------------------------------------------------------
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Gengliang Wang
> Assignee: Gengliang Wang
> Priority: Major
>
> Currently, for most of the cases, the project
> https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the
> runtime errors happen within the original query.
> However, after trying on production, I found that the following queries won't
> show where the divide by 0 error happens
> {code:java}
> create table aggTest(i int, j int, k int, d date) using parquet
> insert into aggTest values(1, 2, 0, date'2022-01-01')
> select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}
> With `percentile` function in the query, the plan can't execute with whole
> stage codegen. Thus the child plan of `Project` is serialized to executors
> for execution, from ProjectExec:
> {code:java}
> protected override def doExecute(): RDD[InternalRow] = {
> child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
> val project = UnsafeProjection.create(projectList, child.output)
> project.initialize(index)
> iter.map(project)
> }
> }{code}
> Note that the `TreeNode.origin` is not serialized to executors since
> `TreeNode` doesn't extend the trait `Serializable`, which results in an empty
> query context on errors. For more details, please read
> https://issues.apache.org/jira/browse/SPARK-39140
> A dummy fix is to make `TreeNode` extend the trait `Serializable`. However,
> it can be performance regression if the query text is long (every `TreeNode`
> carries it for serialization).
> A better fix is to introduce a new trait `SupportQueryContext` and
> materialize the truncated query context for special expressions. This jira
> targets on binary arithmetic expressions only. I will create follow-ups for
> the remaining expressions which support runtime error query context.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]