Re: [I] test github issues [doris]

via GitHub Mon, 25 May 2026 06:20:48 -0700


zclllyybb commented on issue #63629:
URL: https://github.com/apache/doris/issues/63629#issuecomment-4534590780


   Breakwater-GitHub-Analysis-Slot: slot_24015f562221
   
   Initial triage for maintainers:
   
   - Current issue metadata: open issue, version `4.0.5`, no labels/comments at 
the time of this check. The title is still generic (`test github issues`), but 
the body contains a real BE SIGSEGV stack.
   - The reported BE commit `59de8c4c524` matches the local `4.0.5` / 
`4.0.5-rc01` tag.
   - The visible crash frame is `ScannerContext::get_free_block()` calling 
`_free_blocks.try_dequeue()` 
(`be/src/vec/exec/scan/scanner_context.cpp:197-209`). I would not treat 
`moodycamel::ConcurrentQueue` as the root cause from this stack alone; it is 
likely the victim frame after earlier memory/schema corruption.
   - In `4.0.5`, `ScannerScheduler::_scanner_scan()` calls `scanner->prepare()` 
/ `scanner->open()` before the read loop calls `ctx->get_free_block()` 
(`be/src/vec/exec/scan/scanner_scheduler.cpp:192-263`). So the suspicious state 
can be created before this top stack is reached.
   - The high-risk code path is `OlapScanner::prepare()` in `4.0.5`: it can 
reuse a shared `TabletSchema` from `SchemaCache` 
(`be/src/vec/exec/scan/olap_scanner.cpp:193-200`), and later 
`_init_tablet_reader_params()` may mutate that schema via 
`merge_dropped_columns()` when delete predicates are present 
(`be/src/vec/exec/scan/olap_scanner.cpp:395-398`; 
`be/src/olap/tablet_schema.cpp:1361-1377`). With parallel scan enabled by 
default in FE session variables, multiple scanners can prepare concurrently and 
share/mutate the cached schema.
   - This matches the later branch-4.0 fix lineage: #61853 avoids caching 
schemas that would be mutated by delete predicates, and #62427 removes 
`SchemaCache` from `OlapScanner::prepare()` so every scanner builds its own 
`TabletSchema` to avoid concurrent modification. I checked locally that neither 
the #61853 commit nor the #62427 commit is included in `4.0.5`.
   
   Preliminary judgment:
   
   This looks highly likely to be the known 4.0.5 `OlapScanner` / `SchemaCache` 
concurrent schema mutation crash, not a queue implementation bug. The most 
direct validation is to retry on a branch-4.0 build that includes #62427, or 
cherry-pick #62427 onto the affected 4.0.5 build. As a short-term 
mitigation/diagnostic step, retry the same workload with `set 
enable_parallel_scan=false`; if the crash disappears, it further supports the 
concurrent scanner prepare path.
   
   Missing information still needed for a hard reproduction:
   
   - The exact SQL text for query id `64e864ce12f5486b-89edc47b0e6e87e9`, 
including whether it is `INSERT INTO ... SELECT ...`.
   - DDL for the source/target tables, especially key model, delete predicates, 
indexes, schema-change history, and any dropped columns.
   - FE audit/profile for the query and BE logs around this query id before the 
SIGSEGV.
   - Session variables and BE/FE configs, especially `enable_parallel_scan`, 
`parallel_scan_max_scanners_count`, and whether local mode changes scanner 
scheduling.
   - Confirmation whether the same workload still crashes on a build containing 
#62427.
   
   Suggested next action:
   
   Ask the reporter to retest with a build including #62427 or temporarily 
disable parallel scan. If it no longer reproduces, this issue can be linked to 
that fix path. If it still reproduces, please attach the SQL/DDL/log/profile 
above so we can separate it from the already-fixed `SchemaCache` concurrency 
issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] test github issues [doris]

Reply via email to