[ 
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norio Akagi updated SPARK-57064:
--------------------------------
    Description: 
  h3. What changes were proposed in this pull request?

  \{{DisableUnnecessaryBucketedScan}} and \{{CoalesceBucketsInJoin}} 
pattern-match on the concrete class \{{FileSourceScanExec}} in several 
read-only match sites where only trait-level fields (\{{bucketedScan}}, 
\{{relation}}, \{{optionalNumCoalescedBuckets}}) are accessed. The 
\{{FileSourceScanLike}} trait
  already declares all of these fields, so the matches can safely be widened.

  This PR changes 3 match sites from \{{FileSourceScanExec}} to 
\{{FileSourceScanLike}}:

  - \{{DisableUnnecessaryBucketedScan.apply}} — the \{{hasBucketedScan}} 
existence check
  - \{{ExtractJoinWithBuckets.hasScanOperation}} — the bucket spec existence 
check
  - \{{ExtractJoinWithBuckets.getBucketSpec}} — the bucket spec extraction

  Two match sites that call \{{.copy()}} (a case-class-specific method) are 
intentionally left on \{{FileSourceScanExec}}.

  h3. Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
\{{FileSourceScanExec}} with their own scan operators that extend 
\{{FileSourceScanLike}}. With the current concrete-class matches, these 
plugins' scan operators are invisible to the bucketing rules —
  \{{DisableUnnecessaryBucketedScan}} never finds them and 
\{{ExtractJoinWithBuckets}} never extracts their bucket specs.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  h3. Does this PR introduce any user-facing change?

  No. \{{FileSourceScanExec}} already extends \{{FileSourceScanLike}}, so 
behavior is unchanged for vanilla Spark. Plugins that extend 
\{{FileSourceScanLike}} will now be recognized by the bucketing rules.

  was:
**Summary:** `Bucketing rules should match on FileSourceScanLike trait instead 
of FileSourceScanExec`

**Description:**

### What changes were proposed in this pull request?

`DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on 
the concrete class `FileSourceScanExec` in several read-only match sites where 
only trait-level fields (`bucketedScan`, `relation`, 
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait 
already declares all of these fields, so the matches can safely be widened.

This PR changes 3 match sites from `FileSourceScanExec` to `FileSourceScanLike`:

- `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence check
- `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
- `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction

Two match sites that call `.copy()` (a case-class-specific method) are 
intentionally left on `FileSourceScanExec`.

### Why are the changes needed?

Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
`FileSourceScanExec` with their own scan operators that extend 
`FileSourceScanLike`. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules — 
`DisableUnnecessaryBucketedScan` never finds them and `ExtractJoinWithBuckets` 
never extracts their bucket specs.

This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

### Does this PR introduce _any_ user-facing change?

No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is 
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now 
be recognized by the bucketing rules.


> Bucketing rules should match on FileSourceScanLike trait instead of 
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-57064
>                 URL: https://issues.apache.org/jira/browse/SPARK-57064
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Norio Akagi
>            Priority: Minor
>
>   h3. What changes were proposed in this pull request?
>   \{{DisableUnnecessaryBucketedScan}} and \{{CoalesceBucketsInJoin}} 
> pattern-match on the concrete class \{{FileSourceScanExec}} in several 
> read-only match sites where only trait-level fields (\{{bucketedScan}}, 
> \{{relation}}, \{{optionalNumCoalescedBuckets}}) are accessed. The 
> \{{FileSourceScanLike}} trait
>   already declares all of these fields, so the matches can safely be widened.
>   This PR changes 3 match sites from \{{FileSourceScanExec}} to 
> \{{FileSourceScanLike}}:
>   - \{{DisableUnnecessaryBucketedScan.apply}} — the \{{hasBucketedScan}} 
> existence check
>   - \{{ExtractJoinWithBuckets.hasScanOperation}} — the bucket spec existence 
> check
>   - \{{ExtractJoinWithBuckets.getBucketSpec}} — the bucket spec extraction
>   Two match sites that call \{{.copy()}} (a case-class-specific method) are 
> intentionally left on \{{FileSourceScanExec}}.
>   h3. Why are the changes needed?
>   Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
> \{{FileSourceScanExec}} with their own scan operators that extend 
> \{{FileSourceScanLike}}. With the current concrete-class matches, these 
> plugins' scan operators are invisible to the bucketing rules —
>   \{{DisableUnnecessaryBucketedScan}} never finds them and 
> \{{ExtractJoinWithBuckets}} never extracts their bucket specs.
>   This is the same class of issue addressed by SPARK-32332 and SPARK-32430 
> (AQE hardcoding concrete classes instead of traits), but in the bucketing 
> physical rules which were not covered by those fixes.
>   h3. Does this PR introduce any user-facing change?
>   No. \{{FileSourceScanExec}} already extends \{{FileSourceScanLike}}, so 
> behavior is unchanged for vanilla Spark. Plugins that extend 
> \{{FileSourceScanLike}} will now be recognized by the bucketing rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to