zclllyybb commented on issue #64720:
URL: https://github.com/apache/doris/issues/64720#issuecomment-4775360566

   Breakwater-GitHub-Analysis-Slot: slot_525edce557da
   This content is generated by AI for reference only.
   
   Initial analysis:
   
   This looks like a code regression, not a user-side issue. I checked the 
reported 4.1.2 release line against local tag `4.1.2-rc01`, and the current 
code path explains the empty result.
   
   Evidence:
   
   - The published 4.x `EXPLODE_SPLIT` docs say that when `<str>` is an empty 
string or cannot be split, `explode_split` should return one row; the docs also 
show `explode_split("", ",")` returning one row with an empty string.
   - In 4.1.2, `ExplodeSplit.rewriteWhenAnalyze()` rewrites `explode_split(str, 
delimiter)` to `explode(split_by_string(str, delimiter))`. `ExplodeSplitOuter` 
similarly rewrites to `explode_outer(split_by_string(...))`.
   - Both FE constant folding and BE scalar execution for `split_by_string` 
explicitly return an empty array for an empty source string. The generated 
regression output currently expects `split_by_string('', ',')` to be `[]`.
   - The table-function operator skips a child row when a non-outer table 
function has an empty result. Therefore `explode(split_by_string("", ","))` 
produces no rows, which matches the reported behavior.
   - This differs from the older 3.x implementation. In `3.0.8-rc02`, 
`explode_split` still had a dedicated BE `VExplodeSplitTableFunction`; its 
split loop produced one empty `StringRef` for an empty input string, so normal 
`explode_split("", ",")` emitted one empty-string row. That implementation was 
removed by public PR #54886 / commit `39ab12d3e80` (`Rewrite explode_split to 
explode + split_by_string`).
   - I also checked freshly fetched `upstream/master`; the same `explode_split 
-> explode(split_by_string(...))` rewrite and the same `split_by_string` 
empty-source branch are still present, so this does not appear fixed on master 
from the inspected code path.
   
   Suggested next steps:
   
   1. Add regression coverage for `select * from example lateral view 
explode_split("", ",") t as c;`, expecting one row with an empty string.
   2. Decide the fix boundary:
      - If the published 4.x `SPLIT_BY_STRING` docs are authoritative too, fix 
`split_by_string("", non_null_delimiter)` to return `[""]`, then 
`explode_split` will naturally return one row.
      - If changing scalar `split_by_string` is considered incompatible with 
existing behavior/tests, add an `explode_split`-specific compatibility path so 
`explode_split` preserves the documented 3.x/table-function behavior without 
globally changing `split_by_string`.
   3. Recheck `explode_split_outer("", ",")`, `explode_split(NULL, ",")`, 
delimiter `NULL`, and empty delimiter cases with the same regression suite so 
the fix does not blur normal `explode` and `explode_outer` semantics.
   
   No additional logs or runtime profile are needed for the first code-level 
fix. The main missing input is the maintainer decision on whether the 
compatibility fix should be applied globally to `split_by_string` or scoped 
only to `explode_split`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to