[ 
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norio Akagi updated SPARK-57064:
--------------------------------
    Description: 
h3. What changes were proposed in this pull request?

DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the 
concrete class FileSourceScanExec in several read-only match sites where only 
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are 
accessed. The FileSourceScanLike trait already declares all of these
fields, so the matches can safely be widened.

This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
 - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
 - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
 - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction

Two match sites that call .copy() (a case-class-specific method) are 
intentionally left on FileSourceScanExec.
h3. Why are the changes needed?

Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
FileSourceScanExec with their own scan operators that extend 
FileSourceScanLike. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules.

This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.
h3. Does this PR introduce any user-facing change?

No. FileSourceScanExec already extends FileSourceScanLike, so behavior is 
unchanged for vanilla Spark.

  was:
What changes were proposed in this pull request?

  DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the 
concrete class FileSourceScanExec in several read-only match sites where only 
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are 
accessed. The FileSourceScanLike trait already declares all of these
  fields, so the matches can safely be widened.

  This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
  - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
  - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
  - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction

  Two match sites that call .copy() (a case-class-specific method) are 
intentionally left on FileSourceScanExec.

  Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
FileSourceScanExec with their own scan operators that extend 
FileSourceScanLike. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  Does this PR introduce any user-facing change?

  No. FileSourceScanExec already extends FileSourceScanLike, so behavior is 
unchanged for vanilla Spark.



> Bucketing rules should match on FileSourceScanLike trait instead of 
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-57064
>                 URL: https://issues.apache.org/jira/browse/SPARK-57064
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Norio Akagi
>            Priority: Minor
>
> h3. What changes were proposed in this pull request?
> DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the 
> concrete class FileSourceScanExec in several read-only match sites where only 
> trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are 
> accessed. The FileSourceScanLike trait already declares all of these
> fields, so the matches can safely be widened.
> This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
>  - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
>  - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
>  - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
> Two match sites that call .copy() (a case-class-specific method) are 
> intentionally left on FileSourceScanExec.
> h3. Why are the changes needed?
> Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
> FileSourceScanExec with their own scan operators that extend 
> FileSourceScanLike. With the current concrete-class matches, these plugins' 
> scan operators are invisible to the bucketing rules.
> This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
> hardcoding concrete classes instead of traits), but in the bucketing physical 
> rules which were not covered by those fixes.
> h3. Does this PR introduce any user-facing change?
> No. FileSourceScanExec already extends FileSourceScanLike, so behavior is 
> unchanged for vanilla Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to