[
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norio Akagi updated SPARK-57064:
--------------------------------
Description:
What changes were proposed in this pull request?
DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the
concrete class FileSourceScanExec in several read-only match sites where only
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are
accessed. The FileSourceScanLike trait already declares all of these
fields, so the matches can safely be widened.
This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
- DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
- ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
- ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
Two match sites that call .copy() (a case-class-specific method) are
intentionally left on FileSourceScanExec.
Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
FileSourceScanExec with their own scan operators that extend
FileSourceScanLike. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
Does this PR introduce any user-facing change?
No. FileSourceScanExec already extends FileSourceScanLike, so behavior is
unchanged for vanilla Spark.
was:
### What changes were proposed in this pull request?
`DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on
the concrete class `FileSourceScanExec` in several read-only match sites where
only trait-level fields (`bucketedScan`, `relation`,
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait
already declares
all of these fields, so the matches can safely be widened.
This PR changes 3 match sites from `FileSourceScanExec` to
`FileSourceScanLike`:
- `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence
check
- `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
- `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction
Two match sites that call `.copy()` (a case-class-specific method) are
intentionally left on `FileSourceScanExec`.
### Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
`FileSourceScanExec` with their own scan operators that extend
`FileSourceScanLike`. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules —
`DisableUnnecessaryBucketedScan`
never finds them and `ExtractJoinWithBuckets` never extracts their bucket
specs.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
### Does this PR introduce _any_ user-facing change?
No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now
be recognized by the bucketing rules.
> Bucketing rules should match on FileSourceScanLike trait instead of
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
> Key: SPARK-57064
> URL: https://issues.apache.org/jira/browse/SPARK-57064
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Norio Akagi
> Priority: Minor
>
> What changes were proposed in this pull request?
> DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on
> the concrete class FileSourceScanExec in several read-only match sites where
> only trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets)
> are accessed. The FileSourceScanLike trait already declares all of these
> fields, so the matches can safely be widened.
> This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
> - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
> - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
> - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
> Two match sites that call .copy() (a case-class-specific method) are
> intentionally left on FileSourceScanExec.
> Why are the changes needed?
> Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
> FileSourceScanExec with their own scan operators that extend
> FileSourceScanLike. With the current concrete-class matches, these plugins'
> scan operators are invisible to the bucketing rules.
> This is the same class of issue addressed by SPARK-32332 and SPARK-32430
> (AQE hardcoding concrete classes instead of traits), but in the bucketing
> physical rules which were not covered by those fixes.
> Does this PR introduce any user-facing change?
> No. FileSourceScanExec already extends FileSourceScanLike, so behavior is
> unchanged for vanilla Spark.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]