[
https://issues.apache.org/jira/browse/SPARK-57547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ziqi Liu updated SPARK-57547:
-----------------------------
Description:
* AQE creates a separate `TableCacheQueryStageExec` for every reference to the
same `df.cache` (never reused), and each one submits its own build job over the
*{*}shared cache RDD{*}*.
* When concurrent queries reference the same cached relation, first-touches
the cold cache from several jobs at once. Spark has no global, cross-executor
"compute this partition once" barrier (only a per-executor write lock), so the
same partition can be computed by multiple executors.
`CachedRDDBuilder.isCachedRDDLoaded` decided the cache was materialized by
comparing the partition count against a *{*}raw task-completion count{*}*.
Duplicate completions of an empty-output partition could push that count to the
partition total while a row-producing partition was still being built, so the
cache latched as "loaded" with `rowCount == 0`.
* `AQEPropagateEmptyRelation` then ("correctly", given the stats it was told)
collapsed the cache branch to an `EmptyRelation`, and that empty propagated
through the plan.
* Additional latent bugs:
** size/rows accumulators could be over-counted
** no accumulators reset upon `clearCache`
was:
* AQE creates a separate `TableCacheQueryStageExec` for every reference to the
same `df.cache` (never reused), and each one submits its own build job over the
**shared cache RDD**.
* When concurrent queries reference the same cached relation, first-touches
the cold cache from several jobs at once. Spark has no global, cross-executor
"compute this partition once" barrier (only a per-executor write lock), so the
same partition can be computed by multiple executors.
`CachedRDDBuilder.isCachedRDDLoaded` decided the cache was materialized by
comparing the partition count against a **raw task-completion count**.
Duplicate completions of an empty-output partition could push that count to the
partition total while a row-producing partition was still being built, so the
cache latched as "loaded" with `rowCount == 0`.
* `AQEPropagateEmptyRelation` then ("correctly", given the stats it was told)
collapsed the cache branch to an `EmptyRelation`, and that empty propagated
through the plan -- the `MERGE` saw `numSourceRows = 0` and committed nothing,
with no error.
* Additional latent bugs:
** size/rows accumulators could be over-counted
** no accumulators reset upon `clearCache`
> Incorrect InMemoryRelation materialization under conrrent queries
> -----------------------------------------------------------------
>
> Key: SPARK-57547
> URL: https://issues.apache.org/jira/browse/SPARK-57547
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Ziqi Liu
> Priority: Major
> Labels: pull-request-available
>
> * AQE creates a separate `TableCacheQueryStageExec` for every reference to
> the same `df.cache` (never reused), and each one submits its own build job
> over the *{*}shared cache RDD{*}*.
> * When concurrent queries reference the same cached relation, first-touches
> the cold cache from several jobs at once. Spark has no global, cross-executor
> "compute this partition once" barrier (only a per-executor write lock), so
> the same partition can be computed by multiple executors.
> `CachedRDDBuilder.isCachedRDDLoaded` decided the cache was materialized by
> comparing the partition count against a *{*}raw task-completion count{*}*.
> Duplicate completions of an empty-output partition could push that count to
> the partition total while a row-producing partition was still being built, so
> the cache latched as "loaded" with `rowCount == 0`.
> * `AQEPropagateEmptyRelation` then ("correctly", given the stats it was
> told) collapsed the cache branch to an `EmptyRelation`, and that empty
> propagated through the plan.
> * Additional latent bugs:
> ** size/rows accumulators could be over-counted
> ** no accumulators reset upon `clearCache`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]