felipecrv commented on issue #41453:
URL: https://github.com/apache/arrow/issues/41453#issuecomment-2150735331
Expanding. Let's say we have the following relational query formula that
produces a list-view typed column from list-views `a`, `b`, and `c`:
```sql
CASE
WHEN list_value_length(a) > 1 THEN list_slice(a, 1)
WHEN list_value_length(b) > 1 THEN list_slice(b, 1)
ELSE coalesce(c, [0])
END AS resulting_list
```
In a LISP-like intermediate representation, this can be written as:
```lisp
(cond
(> (list_value_length a) 1) (list_slice a 1)
(> (list_value_length b) 1) (list_slice b 1)
else (coalesce b [0]))
```
Due to how expressions are currently represented in `arrow::compute` we are
forced to eagerly eval all the conditions and branches. Even when the condition
is false for that branch. We need `cond` to be a special-form [1] meaning that
all expressions are passed "by name" [2].
The evaluation of the expressions for this example should happen in this
order:
```cpp
auto output = pre_allocate_list_view(batch.length);
// builds an array with the lengths of a
auto length_a = list_value_length(a);
// builds the first pair of sel. vectors (for the first condition)
auto sel0_true, sel0_false = greater_than(length_a, 1);
// evaluate the first expression only on positions in sel0_true.
// since `output` is a list-view, all these random positions in the
// output can be written now before the other branches write to
// their positions.
// (we pass sel0_true to the list_slice kernel as an optional parameter)
list_slice[sel0_true](a, 1, &output);
// builds an array with the lengths of b
// (note that now we pass the sel0_false selection vector because
// we only have to run `list_value_length` on the positions where the
// previous conditions were false)
auto length_b = list_value_length[sel0_false](b);
// builds the next pair of selection vectors
// (note that greater_than[sel0_false] only checks the relevant positions,
// consequently, sel1_true/false are going to be smaller than the first
// pair of selection vectors)
auto sel1_true, sel1_false = greater_than[sel0_false](length_a, 1);
// evaluate the second branch's expression. none of the positions
// in sel1_true were present in sel0_true, so we're only writing on
// values that weren't written before
list_slice[sel1_true](b, 1, &output);
// now evaluate the else case with the last negative selection vector
coalesce[sel1_false](b, [0])
```
[1] https://github.com/apache/arrow/issues/41094#issuecomment-2087716483
[2] [Call by
name](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_name) (easily
and often confused as "lazy evaluation")
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]