[ 
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norio Akagi updated SPARK-57064:
--------------------------------
    Description: 
What changes were proposed in this pull request?

  DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on the 
concrete class FileSourceScanExec in several read-only match sites where only 
trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) are 
accessed. The FileSourceScanLike trait already declares all of these
  fields, so the matches can safely be widened.

  This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
  - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
  - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
  - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction

  Two match sites that call .copy() (a case-class-specific method) are 
intentionally left on FileSourceScanExec.

  Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
FileSourceScanExec with their own scan operators that extend 
FileSourceScanLike. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  Does this PR introduce any user-facing change?

  No. FileSourceScanExec already extends FileSourceScanLike, so behavior is 
unchanged for vanilla Spark.


  was:
### What changes were proposed in this pull request?

  `DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on 
the concrete class `FileSourceScanExec` in several read-only match sites where 
only trait-level fields (`bucketedScan`, `relation`, 
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait 
already declares
  all of these fields, so the matches can safely be widened.

  This PR changes 3 match sites from `FileSourceScanExec` to 
`FileSourceScanLike`:

  - `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence 
check
  - `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
  - `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction

  Two match sites that call `.copy()` (a case-class-specific method) are 
intentionally left on `FileSourceScanExec`.

  ### Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
`FileSourceScanExec` with their own scan operators that extend 
`FileSourceScanLike`. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules — 
`DisableUnnecessaryBucketedScan`
  never finds them and `ExtractJoinWithBuckets` never extracts their bucket 
specs.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  ### Does this PR introduce _any_ user-facing change?

  No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is 
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now 
be recognized by the bucketing rules.

 


> Bucketing rules should match on FileSourceScanLike trait instead of 
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-57064
>                 URL: https://issues.apache.org/jira/browse/SPARK-57064
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Norio Akagi
>            Priority: Minor
>
> What changes were proposed in this pull request?
>   DisableUnnecessaryBucketedScan and CoalesceBucketsInJoin pattern-match on 
> the concrete class FileSourceScanExec in several read-only match sites where 
> only trait-level fields (bucketedScan, relation, optionalNumCoalescedBuckets) 
> are accessed. The FileSourceScanLike trait already declares all of these
>   fields, so the matches can safely be widened.
>   This PR changes 3 match sites from FileSourceScanExec to FileSourceScanLike:
>   - DisableUnnecessaryBucketedScan.apply — the hasBucketedScan existence check
>   - ExtractJoinWithBuckets.hasScanOperation — the bucket spec existence check
>   - ExtractJoinWithBuckets.getBucketSpec — the bucket spec extraction
>   Two match sites that call .copy() (a case-class-specific method) are 
> intentionally left on FileSourceScanExec.
>   Why are the changes needed?
>   Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
> FileSourceScanExec with their own scan operators that extend 
> FileSourceScanLike. With the current concrete-class matches, these plugins' 
> scan operators are invisible to the bucketing rules.
>   This is the same class of issue addressed by SPARK-32332 and SPARK-32430 
> (AQE hardcoding concrete classes instead of traits), but in the bucketing 
> physical rules which were not covered by those fixes.
>   Does this PR introduce any user-facing change?
>   No. FileSourceScanExec already extends FileSourceScanLike, so behavior is 
> unchanged for vanilla Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to