zclllyybb commented on issue #64720:
URL: https://github.com/apache/doris/issues/64720#issuecomment-4775360566
Breakwater-GitHub-Analysis-Slot: slot_525edce557da
This content is generated by AI for reference only.
Initial analysis:
This looks like a code regression, not a user-side issue. I checked the
reported 4.1.2 release line against local tag `4.1.2-rc01`, and the current
code path explains the empty result.
Evidence:
- The published 4.x `EXPLODE_SPLIT` docs say that when `<str>` is an empty
string or cannot be split, `explode_split` should return one row; the docs also
show `explode_split("", ",")` returning one row with an empty string.
- In 4.1.2, `ExplodeSplit.rewriteWhenAnalyze()` rewrites `explode_split(str,
delimiter)` to `explode(split_by_string(str, delimiter))`. `ExplodeSplitOuter`
similarly rewrites to `explode_outer(split_by_string(...))`.
- Both FE constant folding and BE scalar execution for `split_by_string`
explicitly return an empty array for an empty source string. The generated
regression output currently expects `split_by_string('', ',')` to be `[]`.
- The table-function operator skips a child row when a non-outer table
function has an empty result. Therefore `explode(split_by_string("", ","))`
produces no rows, which matches the reported behavior.
- This differs from the older 3.x implementation. In `3.0.8-rc02`,
`explode_split` still had a dedicated BE `VExplodeSplitTableFunction`; its
split loop produced one empty `StringRef` for an empty input string, so normal
`explode_split("", ",")` emitted one empty-string row. That implementation was
removed by public PR #54886 / commit `39ab12d3e80` (`Rewrite explode_split to
explode + split_by_string`).
- I also checked freshly fetched `upstream/master`; the same `explode_split
-> explode(split_by_string(...))` rewrite and the same `split_by_string`
empty-source branch are still present, so this does not appear fixed on master
from the inspected code path.
Suggested next steps:
1. Add regression coverage for `select * from example lateral view
explode_split("", ",") t as c;`, expecting one row with an empty string.
2. Decide the fix boundary:
- If the published 4.x `SPLIT_BY_STRING` docs are authoritative too, fix
`split_by_string("", non_null_delimiter)` to return `[""]`, then
`explode_split` will naturally return one row.
- If changing scalar `split_by_string` is considered incompatible with
existing behavior/tests, add an `explode_split`-specific compatibility path so
`explode_split` preserves the documented 3.x/table-function behavior without
globally changing `split_by_string`.
3. Recheck `explode_split_outer("", ",")`, `explode_split(NULL, ",")`,
delimiter `NULL`, and empty delimiter cases with the same regression suite so
the fix does not blur normal `explode` and `explode_outer` semantics.
No additional logs or runtime profile are needed for the first code-level
fix. The main missing input is the maintainer decision on whether the
compatibility fix should be applied globally to `split_by_string` or scoped
only to `explode_split`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]