zclllyybb commented on issue #64836: URL: https://github.com/apache/doris/issues/64836#issuecomment-4796863622
Breakwater-GitHub-Analysis-Slot: slot_bd75fca6c132 This content is generated by AI for reference only. Initial code-level triage: this report looks valid and points to a real BE crash bug, not a user-side load configuration problem. What I checked: - On the refreshed `upstream/branch-4.0` ref (`e9057613d08`), `be/src/vec/exec/scan/file_scanner.cpp` initializes `_src_slot_descs_order_by_dest` only inside `if (_params->__isset.dest_sid_to_src_sid_without_trans)`, but `_convert_to_output_block()` later reads `_src_slot_descs_order_by_dest[dest_index]` in the strict-mode NULL branch without first proving that the vector was populated. - On current `upstream/master` (`556586ce729`), the scanner file has moved to `be/src/exec/scan/file_scanner.cpp`, but the same access pattern is still present. - FE load planning in `fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadPlanInfoCollector.java` only fills `destSlotIdToSrcSlotIdWithoutTrans` for target slots that are direct table-column mappings. With `columns:c1,c2,k=c1,v=c2`, the target table columns are produced through expressions, so that map can legitimately be empty and the optional Thrift field is not set. Failure chain: 1. The source tuple has file slots such as `c1` and `c2`. 2. The destination tuple has table slots `k` and `v`, both evaluated through expressions `k=c1` and `v=c2`. 3. Since there is no direct destination-column to source-column mapping, FE can send no `dest_sid_to_src_sid_without_trans`. 4. BE still builds `_dest_vexpr_ctx` for the destination columns, but leaves `_src_slot_descs_order_by_dest` empty. 5. When an expression result is NULL, `FileScanner::_convert_to_output_block()` enters the strict-mode branch and indexes `_src_slot_descs_order_by_dest[dest_index]`. For an empty vector this is out-of-bounds and can SIGSEGV before the regular nullable-column handling is reached. Recommended fix direction: - Treat "no direct source slot for this destination slot" as a valid state for expression-derived columns. - In BE, either size `_src_slot_descs_order_by_dest` to the destination-slot count and fill missing entries with `nullptr`, or guard the strict source-value rejection path with an explicit "direct source mapping exists for this dest slot" check. - Avoid using `_dest_slot_to_src_slot_index[dest_index]` unless that direct mapping has been proven to exist. A `find()`/checked lookup or a shared helper would avoid accidental insertion through `operator[]`. - For expression-derived columns without a direct source mapping, fall through to the existing nullable check. That preserves the intended distinction: strict mode can reject "source value was non-NULL but converted to NULL" only when there is a direct source value to report; ordinary destination nullability still decides whether a NULL derived result is allowed. Suggested regression coverage: - Stream Load CSV case with `strict_mode=true`, `columns:c1,c2,k=c1,v=c2`, and one row `1,\N` into a table where `v` is nullable. Expected: no BE crash; the load should complete according to normal nullable semantics. - A companion case where the derived destination column is `NOT NULL`. Expected: no BE crash; the row should be rejected by the normal not-null load validation path. - A mixed direct plus derived-column case should also be kept to ensure the strict-mode "bad source value" error path still reports the original source value when a direct mapping exists. The only extra information that would help confirm deployment impact is the exact 4.0.x build or commit and the BE stack trace from `be.out`, but the code path above is already sufficient to justify a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
