jnh5y opened a new pull request, #28556:
URL: https://github.com/apache/flink/pull/28556

   ## What is the purpose of the change
   
   `PushFilterInCalcIntoTableSourceScanRule` — the rule that fires when a 
`Filter` sits above a `Calc` above a table source scan — was unaware of 
metadata filter push-down, causing two bugs:
   
   1. **Crash on compiled-plan restore.** For sources implementing both 
`SupportsFilterPushDown` and `SupportsReadingMetadata`, the rule routed all 
predicates (including metadata-column predicates) through the physical path, 
storing a `FilterPushDownSpec` with raw `RexInputRef` indices. When 
`ProjectPushDownSpec` later narrowed the scan's row type, those indices no 
longer corresponded to the right columns, crashing restore.
   
   2. **Missed push-down for metadata-only sources.** The rule's `matches()` 
guard only checked `canPushdownFilter()` (physical), so for sources 
implementing only `SupportsReadingMetadata` the rule never fired at all.
   
   This PR also fixes a latent classification bug in both rules: the old 
heuristic detected metadata columns via `inputRef.getIndex() >= 
physicalColumnCount`, which is incorrect after projection narrows the scan 
type. Mixed predicates like `OR(id > 0, m0 > 0)` (spanning physical and 
metadata columns) must stay as runtime Calc filters but were instead routed 
into the physical push-down path.
   
   ## Brief change log
   
     - `PushFilterInCalcIntoTableSourceScanRule.matches()` now also checks 
`canPushdownMetadataFilter()`, enabling the rule to fire for metadata-only 
sources
     - Extract duplicated classify-and-push logic into 
`classifyAndPushFilters()` in `PushFilterIntoSourceScanRuleBase`, shared by 
both filter-push-down rules
     - Replace the `physicalColumnCount` heuristic with a `Set<Integer>` of 
metadata column indices derived from the scan's current row type via the 
column-to-metadata-key map, correctly handling projection-narrowed scans
     - Add `referencesAnyMetadataColumns()` to detect mixed predicates and 
route them to the runtime Calc instead of the physical push-down path
     - Document the identity contract on 
`SupportsReadingMetadata.applyMetadataFilters` Javadoc
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - Added `testMixedOrPredicateStaysAsRuntimeFilter` to 
`MetadataFilterResultShapesITCase`: asserts that a predicate touching both 
physical and metadata columns (`id < 3 OR m0 > 0`) never appears in 
`filter=[...]` or `metadataFilter=[...]` and produces correct runtime results
     - Added `testMixedOrWithSeparatePhysicalPredicate` to 
`MetadataFilterResultShapesITCase`: asserts that a pure physical predicate is 
still pushed while a mixed OR predicate stays as a runtime Calc
     - Existing `MetadataFilterResultShapesITCase` tests (accept-all, 
accept-none, partial accept, best-effort overlap) continue to pass, covering 
the `PushFilterInCalcIntoTableSourceScanRule` path via the shared 
`classifyAndPushFilters()` method
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes (`SupportsReadingMetadata` — Javadoc only, no 
interface change)
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (Claude Sonnet 4.6)
   
   <!--
   Generated-by: Claude Sonnet 4.6
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to