WangGuangxin opened a new pull request, #9315:
URL: https://github.com/apache/incubator-gluten/pull/9315
## What changes were proposed in this pull request?
The `ColumnaPartialProject` can also supports build-in functions, especially
the blacklist expressions.
One typical scenario is regexp. The native regexp lib `re2` is much slower
than Java regexp lib, and also has some semantic difference with Java lib.
Take a simple sql in our production as an example
```
SELECT p_date,
cast(id AS BIGINT) AS creative_id,
regexp_replace(DATA, '|| ', '') AS tmp
FROM test_table
WHERE p_date='20250227'
```
| Test | Cost |
|--------|--------|
| Partial Project Fallback | 1284h |
| Whole Project Fallback | 1389h |
| Native (No Fallback) | 3384h |
The detailed plan are
- Partial Project Fallback
<img width="307" alt="image"
src="https://github.com/user-attachments/assets/5126ad20-cfda-4fa3-8465-5856647292d4"
/>
- Whole Project Fallback
<img width="317" alt="image"
src="https://github.com/user-attachments/assets/f2c79be4-3f6e-472c-baa6-764d760db401"
/>
- Native (No Fallback)
<img width="347" alt="image"
src="https://github.com/user-attachments/assets/8b7f9524-1676-4ce5-a83a-adb1ea477fba"
/>
In this PR, try to use `ColumnarPartialProject` to handle the blacklist
expressions.
(Fixes: \#9313)
## How was this patch tested?
more UT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]