924060929 commented on PR #64847:
URL: https://github.com/apache/doris/pull/64847#issuecomment-4808089222

   For reviewers — a bit more on the root cause, since it's subtle.
   
   `array_first`/`array_last` desugar to:
   
   ```
   array_first(lambda, arr) = element_at(array_filter(arr, array_map(lambda)), 
1)
   ```
   
   The "mask must be `array<boolean>`" requirement lives **only** on 
`array_filter`'s 2nd argument.
   
   **Why the old code rejected a castable lambda result**
   
   Two things combined so the implicit cast was never applied:
   
   1. `ArrayFirst extends ElementAt`, and the desugaring happened **in the 
constructor** — i.e. at bind time, *before* type coercion. So 
`processBoundFunction` sees the node through `ElementAt`'s signature 
`(array<any>, bigint)`. The inner `array_filter` result (`array<int>`) fits the 
`array<any>` slot, so no cast is inserted and coercion never descends into it.
   2. That inner `array_filter` is created inside the constructor and is never 
visited by the analyzer as a function of its own, so `processBoundFunction` 
never runs against `array_filter`'s own `(array<any>, array<boolean>)` 
signature — nothing ever casts the `array_map` result.
   
   So for e.g. `array_first(x -> x, [0,1,2])`, the `array<int>` mask stays 
uncast and is only caught later by `checkInputDataTypes`, which recurses into 
the inner `array_filter`, finds `array<int>` where `array<boolean>` is 
required, and raises an error — even though `int` -> `boolean` is a legal 
implicit cast.
   
   **Why this PR fixes it**
   
   The core idea: hoist the boolean requirement onto `array_first` itself, and 
do the cast *before* the rewrite.
   
   - `array_first`/`array_last` now `extends ScalarFunction` with their **own** 
signature `(array<any>, array<boolean>)`, instead of impersonating `ElementAt`.
   - The desugaring is deferred to `rewriteWhenAnalyze()`, which the analyzer 
runs **after** `processBoundFunction`.
   
   Resulting order:
   
   1. `processBoundFunction(ArrayFirst)` coerces against the new signature → 
the `array_map` argument (now a *direct* child) is cast to `array<boolean>`.
   2. `rewriteWhenAnalyze()` then builds `element_at(array_filter(arr, 
<already-cast array_map>), 1)` via the 2-arg `array_filter` constructor, which 
takes the coerced argument as-is.
   
   So the cast is inserted before the node collapses into 
`element_at(array_filter(...))`, and the inner `array_filter` finally receives 
an `array<boolean>` mask — which is what lets lambda results that are 
implicitly castable to boolean work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to