[ https://issues.apache.org/jira/browse/SPARK-47398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-47398: ----------------------------------- Labels: pull-request-available (was: ) > AQE doesn't allow for extension of InMemoryTableScanExec > -------------------------------------------------------- > > Key: SPARK-47398 > URL: https://issues.apache.org/jira/browse/SPARK-47398 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0, 3.5.1 > Reporter: Raza Jafri > Priority: Major > Labels: pull-request-available > > As part of SPARK-42101, we added support to AQE for handling > InMemoryTableScanExec. > This change directly references `InMemoryTableScanExec` which limits users > from extending the caching functionality that was added as part of > SPARK-32274 > In `AdaptiveSparkPlanExec` we are wrapping `InMemoryTableScanExec` in > `TableCacheQueryStageExec`. To accomplish this we are currently matching on > the Exec, I am proposing that we should match on a trait instead just like > how we do it for `Exchange` by matching against `ShuffleExchangeLike` and > `BroadcastExchangeLike`. In the RAPIDS Accelerator for Apache Spark, we > replace the `InMemoryTableScanExec` with our version which does some > optimizations. This could cause a problem as the benefits of SPARK-42101 > might be lost or the worst case could be that we try to look for the said > Exec and throw an exception > > Looking at the current code, I propose the trait to be as > {code:java} > trait InMemoryTableScanLike extends LeafExecNode { > /** > * Returns whether the cache buffer is loaded > */ > def isMaterialized: Boolean > /** > * Returns the actual cached RDD without filters and serialization of > row/columnar. > */ > def baseCacheRDD(): RDD[CachedBatch] > /** > * Returns the runtime statistics after shuffle materialization. > */ > def runtimeStatistics: Statistics > } {code} > This is just based on what I know about how AQE is using it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org