[
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norio Akagi updated SPARK-57064:
--------------------------------
Description:
h3. What changes were proposed in this pull request?
DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the
concrete class FileSourceScanExec in several read-only match sites where only
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are
accessed. The FileSourceScanLike trait already declares all of these
fields, so the matches can safely be widened.
This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
- DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
- ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
- ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
Two match sites that call .copy() (a case-class-specific method) are
intentionally left on FileSourceScanExec.
h3. Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
FileSourceScanExec with their own scan operators that extend
FileSourceScanLike. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
h3. Does this PR introduce any user-facing change?
No. FileSourceScanExec already extends FileSourceScanLike, so behavior is
unchanged for vanilla Spark.
was:
What changes were proposed in this pull request?
DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the
concrete class FileSourceScanExec in several read-only match sites where only
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are
accessed. The FileSourceScanLike trait already declares all of these
fields, so the matches can safely be widened.
This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
- DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
- ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
- ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
Two match sites that call .copy() (a case-class-specific method) are
intentionally left on FileSourceScanExec.
Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
FileSourceScanExec with their own scan operators that extend
FileSourceScanLike. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
Does this PR introduce any user-facing change?
No. FileSourceScanExec already extends FileSourceScanLike, so behavior is
unchanged for vanilla Spark.
> Bucketing rules should match on FileSourceScanLike trait instead of
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
> Key: SPARK-57064
> URL: https://issues.apache.org/jira/browse/SPARK-57064
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Norio Akagi
> Priority: Minor
>
> h3. What changes were proposed in this pull request?
> DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the
> concrete class FileSourceScanExec in several read-only match sites where only
> trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are
> accessed. The FileSourceScanLike trait already declares all of these
> fields, so the matches can safely be widened.
> This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
> - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
> - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
> - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
> Two match sites that call .copy() (a case-class-specific method) are
> intentionally left on FileSourceScanExec.
> h3. Why are the changes needed?
> Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
> FileSourceScanExec with their own scan operators that extend
> FileSourceScanLike. With the current concrete-class matches, these plugins'
> scan operators are invisible to the bucketing rules.
> This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
> hardcoding concrete classes instead of traits), but in the bucketing physical
> rules which were not covered by those fixes.
> h3. Does this PR introduce any user-facing change?
> No. FileSourceScanExec already extends FileSourceScanLike, so behavior is
> unchanged for vanilla Spark.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]