[ 
https://issues.apache.org/jira/browse/SPARK-57547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57547:
-----------------------------------
    Labels: pull-request-available  (was: )

> Incorrect InMemoryRelation materialization under conrrent queries
> -----------------------------------------------------------------
>
>                 Key: SPARK-57547
>                 URL: https://issues.apache.org/jira/browse/SPARK-57547
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Ziqi Liu
>            Priority: Major
>              Labels: pull-request-available
>
> * AQE creates a separate `TableCacheQueryStageExec` for every reference to 
> the same `df.cache` (never reused), and each one submits its own build job 
> over the **shared cache RDD**.
>  * When concurrent queries reference the same cached relation, first-touches 
> the cold cache from several jobs at once. Spark has no global, cross-executor 
> "compute this partition once" barrier (only a per-executor write lock), so 
> the same partition can be computed by multiple executors. 
> `CachedRDDBuilder.isCachedRDDLoaded` decided the cache was materialized by 
> comparing the partition count against a **raw task-completion count**. 
> Duplicate completions of an empty-output partition could push that count to 
> the partition total while a row-producing partition was still being built, so 
> the cache latched as "loaded" with `rowCount == 0`.
>  * `AQEPropagateEmptyRelation` then ("correctly", given the stats it was 
> told) collapsed the cache branch to an `EmptyRelation`, and that empty 
> propagated through the plan -- the `MERGE` saw `numSourceRows = 0` and 
> committed nothing, with no error.
>  * Additional latent bugs:
>  ** size/rows accumulators could be over-counted
>  ** no accumulators reset upon `clearCache`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to