liaoxin01 opened a new pull request, #64755:
URL: https://github.com/apache/doris/pull/64755

   ## Proposed changes
   
   ### Problem
   
   When an `INSERT`/`CTAS` (or any query) hits a data-conversion error during 
scan — e.g. a strict `CAST(... AS BIGINT)` on an empty string produced by 
`regexp_extract`, which returns `INVALID_ARGUMENT parse number fail, string: 
''` — the user-facing error reads:
   
   ```
   [INVALID_ARGUMENT]parse number fail, string: ''failed to initialize storage 
reader. tablet=421411554, backend=10.228.1.18
   ```
   
   The `failed to initialize storage reader. tablet=...` suffix makes it look 
like the tablet/segment is corrupted or missing, when the real cause is a 
data/expression error.
   
   ### Root cause
   
   `OlapScanner::_open_impl` appended `failed to initialize storage reader. 
tablet=...` to **any** non-OK status returned by `TabletReader::init()`. But 
`init()` does not merely set up objects — the merge reader eagerly reads the 
first block of each rowset (`BlockReader::_init_collect_iter` → 
`VCollectIterator::build_heap` → `Level0Iterator::refresh_current_row` → 
`RowsetReader::next_batch`) to seed the merge heap. Pushed-down expressions 
(`common_expr_ctxs`) are evaluated during that first-block read, so a 
strict-cast failure surfaces inside `init()` and gets wrapped with the 
storage-reader message.
   
   ### Fix
   
   Branch on the error code: only genuine storage-level failures keep the 
`failed to initialize storage reader` wording. For `INVALID_ARGUMENT` 
(data/expression errors) the message stays neutral and explicitly notes it is a 
data/expression error rather than a storage failure, while still reporting 
tablet/backend for locating the node.
   
   This is purely a message/diagnostics change; control flow and the returned 
error code are unchanged. The underlying strict-cast semantics issue is tracked 
separately (see apache/doris#64266).
   
   ## Further comments
   
   No behavior change other than the error text; no new tests added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to