[ https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gengliang Wang updated SPARK-39166: ----------------------------------- Description: Currently, for most of the cases, the project https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the runtime errors happen within the original query. However, after trying on production, I found that the following queries won't show where the divide by 0 error happens {code:java} create table aggTest(i int, j int, k int, d date) using parquet insert into aggTest values(1, 2, 0, date'2022-01-01') select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code} With `percentile` function in the query, the plan can't execute with whole stage codegen. Thus the child plan of `Project` is serialized to executors for execution, from ProjectExec: {code:java} protected override def doExecute(): RDD[InternalRow] = { child.execute().mapPartitionsWithIndexInternal { (index, iter) => val project = UnsafeProjection.create(projectList, child.output) project.initialize(index) iter.map(project) } }{code} Note that the `TreeNode.origin` is not serialized to executors since `TreeNode` doesn't extend the trait `Serializable`, which results in an empty query context on errors. For more details, please read https://issues.apache.org/jira/browse/SPARK-39140 A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, it can be performance regression if the query text is long (every `TreeNode` carries it for serialization). A better fix is to introduce a new trait `SupportQueryContext` and materialize the truncated query context for special expressions. This jira targets on binary arithmetic expressions only. I will create follow-ups for the remaining expressions which support runtime error query context. > Provide runtime error query context for Binary Arithmetic when WSCG is off > -------------------------------------------------------------------------- > > Key: SPARK-39166 > URL: https://issues.apache.org/jira/browse/SPARK-39166 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Gengliang Wang > Assignee: Gengliang Wang > Priority: Major > > Currently, for most of the cases, the project > https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the > runtime errors happen within the original query. > However, after trying on production, I found that the following queries won't > show where the divide by 0 error happens > {code:java} > create table aggTest(i int, j int, k int, d date) using parquet > insert into aggTest values(1, 2, 0, date'2022-01-01') > select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code} > With `percentile` function in the query, the plan can't execute with whole > stage codegen. Thus the child plan of `Project` is serialized to executors > for execution, from ProjectExec: > {code:java} > protected override def doExecute(): RDD[InternalRow] = { > child.execute().mapPartitionsWithIndexInternal { (index, iter) => > val project = UnsafeProjection.create(projectList, child.output) > project.initialize(index) > iter.map(project) > } > }{code} > Note that the `TreeNode.origin` is not serialized to executors since > `TreeNode` doesn't extend the trait `Serializable`, which results in an empty > query context on errors. For more details, please read > https://issues.apache.org/jira/browse/SPARK-39140 > A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, > it can be performance regression if the query text is long (every `TreeNode` > carries it for serialization). > A better fix is to introduce a new trait `SupportQueryContext` and > materialize the truncated query context for special expressions. This jira > targets on binary arithmetic expressions only. I will create follow-ups for > the remaining expressions which support runtime error query context. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org