[
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norio Akagi updated SPARK-57064:
--------------------------------
Description:
h3. What changes were proposed in this pull request?
\{{DisableUnnecessaryBucketedScan}} and \{{CoalesceBucketsInJoin}}
pattern-match on the concrete class \{{FileSourceScanExec}} in several
read-only match sites where only trait-level fields (\{{bucketedScan}},
\{{relation}}, \{{optionalNumCoalescedBuckets}}) are accessed. The
\{{FileSourceScanLike}} trait
already declares all of these fields, so the matches can safely be widened.
This PR changes 3 match sites from \{{FileSourceScanExec}} to
\{{FileSourceScanLike}}:
- \{{DisableUnnecessaryBucketedScan.apply}} — the \{{hasBucketedScan}}
existence check
- \{{ExtractJoinWithBuckets.hasScanOperation}} — the bucket spec existence
check
- \{{ExtractJoinWithBuckets.getBucketSpec}} — the bucket spec extraction
Two match sites that call \{{.copy()}} (a case-class-specific method) are
intentionally left on \{{FileSourceScanExec}}.
h3. Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
\{{FileSourceScanExec}} with their own scan operators that extend
\{{FileSourceScanLike}}. With the current concrete-class matches, these
plugins' scan operators are invisible to the bucketing rules —
\{{DisableUnnecessaryBucketedScan}} never finds them and
\{{ExtractJoinWithBuckets}} never extracts their bucket specs.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
h3. Does this PR introduce any user-facing change?
No. \{{FileSourceScanExec}} already extends \{{FileSourceScanLike}}, so
behavior is unchanged for vanilla Spark. Plugins that extend
\{{FileSourceScanLike}} will now be recognized by the bucketing rules.
was:
**Summary:** `Bucketing rules should match on FileSourceScanLike trait instead
of FileSourceScanExec`
**Description:**
### What changes were proposed in this pull request?
`DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on
the concrete class `FileSourceScanExec` in several read-only match sites where
only trait-level fields (`bucketedScan`, `relation`,
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait
already declares all of these fields, so the matches can safely be widened.
This PR changes 3 match sites from `FileSourceScanExec` to `FileSourceScanLike`:
- `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence check
- `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
- `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction
Two match sites that call `.copy()` (a case-class-specific method) are
intentionally left on `FileSourceScanExec`.
### Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
`FileSourceScanExec` with their own scan operators that extend
`FileSourceScanLike`. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules —
`DisableUnnecessaryBucketedScan` never finds them and `ExtractJoinWithBuckets`
never extracts their bucket specs.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
### Does this PR introduce _any_ user-facing change?
No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now
be recognized by the bucketing rules.
> Bucketing rules should match on FileSourceScanLike trait instead of
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
> Key: SPARK-57064
> URL: https://issues.apache.org/jira/browse/SPARK-57064
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Norio Akagi
> Priority: Minor
>
> h3. What changes were proposed in this pull request?
> \{{DisableUnnecessaryBucketedScan}} and \{{CoalesceBucketsInJoin}}
> pattern-match on the concrete class \{{FileSourceScanExec}} in several
> read-only match sites where only trait-level fields (\{{bucketedScan}},
> \{{relation}}, \{{optionalNumCoalescedBuckets}}) are accessed. The
> \{{FileSourceScanLike}} trait
> already declares all of these fields, so the matches can safely be widened.
> This PR changes 3 match sites from \{{FileSourceScanExec}} to
> \{{FileSourceScanLike}}:
> - \{{DisableUnnecessaryBucketedScan.apply}} — the \{{hasBucketedScan}}
> existence check
> - \{{ExtractJoinWithBuckets.hasScanOperation}} — the bucket spec existence
> check
> - \{{ExtractJoinWithBuckets.getBucketSpec}} — the bucket spec extraction
> Two match sites that call \{{.copy()}} (a case-class-specific method) are
> intentionally left on \{{FileSourceScanExec}}.
> h3. Why are the changes needed?
> Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
> \{{FileSourceScanExec}} with their own scan operators that extend
> \{{FileSourceScanLike}}. With the current concrete-class matches, these
> plugins' scan operators are invisible to the bucketing rules —
> \{{DisableUnnecessaryBucketedScan}} never finds them and
> \{{ExtractJoinWithBuckets}} never extracts their bucket specs.
> This is the same class of issue addressed by SPARK-32332 and SPARK-32430
> (AQE hardcoding concrete classes instead of traits), but in the bucketing
> physical rules which were not covered by those fixes.
> h3. Does this PR introduce any user-facing change?
> No. \{{FileSourceScanExec}} already extends \{{FileSourceScanLike}}, so
> behavior is unchanged for vanilla Spark. Plugins that extend
> \{{FileSourceScanLike}} will now be recognized by the bucketing rules.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]