zclllyybb commented on issue #64006:
URL: https://github.com/apache/doris/issues/64006#issuecomment-4599813365

   Breakwater-GitHub-Analysis-Slot: slot_969e40840992
   
   Initial triage: this looks like a real Nereids/external-view schema-drift 
bug on current master, not just a reporting artifact.
   
   I re-read the issue and checked the referenced upstream tree 
(`aa9162840f1`). The reported failure path matches the code:
   
   - For HMS external views, `BindRelation` reads 
`HMSExternalTable.getViewText()`, analyzes that SQL in the external 
catalog/database context, and wraps the analyzed plan in `LogicalView(new 
ExternalView(table, ddlSql), ...)`.
   - `ExternalView.getFullSchema()` delegates to the external view table's own 
`getFullSchema()`, i.e. the schema cached/loaded for the HMS view object.
   - `REFRESH TABLE <base_table>` only invalidates the selected external table 
cache through 
`RefreshManager.refreshTableInternal(...)->invalidateTableCache(table)`. It 
does not refresh the separate view table object.
   - `LogicalView.computeOutput()` then iterates over `child().getOutput()` but 
indexes `view.getFullSchema().get(i)` for every child slot. If the view SQL is 
re-analyzed after the base table refresh and `SELECT *` now expands to 4 child 
slots while the view object's stored schema still has 3 columns, `get(3)` is 
exactly enough to produce `Index 3 out of bounds for length 3`.
   
   So the issue's proposed diagnosis is consistent with the current code chain. 
The existing `CollectionUtils.isEmpty(view.getFullSchema())` guard only 
protects null/empty schemas; it does not protect a non-empty but shorter stored 
view schema.
   
   The suggested local fix is reasonable as a minimal crash fix:
   
   ```java
   List<Column> fullSchema = view.getFullSchema();
   if (CollectionUtils.isEmpty(fullSchema) || i >= fullSchema.size()) {
       qualified = originSlot.withQualifier(fullQualifiers);
   } else {
       qualified = originSlot.withOneLevelTableAndColumnAndQualifier(view, 
fullSchema.get(i), fullQualifiers);
   }
   ```
   
   One semantic point should be made explicit before merging: if Doris intends 
HMS external views to follow the re-analyzed view SQL, then falling back to 
`withQualifier()` for newly expanded slots is consistent with the current 
analyzer behavior and avoids losing the new column. If Doris intends the stored 
HMS view schema to be the authoritative output contract until `REFRESH TABLE 
<view>`, then the fix should instead reconcile/cap the child output to the 
stored view schema rather than only guarding the index. Either way, the current 
uncaught `IndexOutOfBoundsException` is a bug.
   
   Recommended next steps:
   
   1. Add a regression test for an HMS external view whose stored view schema 
has fewer columns than the re-analyzed view body output. The important 
assertion is that `LogicalView.computeOutput()` does not throw when 
`childOutput.size() > view.getFullSchema().size()`.
   2. Reuse `view.getFullSchema()` in a local variable inside `computeOutput()` 
so the code does not repeatedly fetch a potentially cache-backed schema during 
the loop.
   3. Confirm the intended result-column contract for `SELECT *` external views 
after base-table schema drift: expose the newly added base column after base 
refresh, or keep the old view schema until the view itself is refreshed.
   4. Keep the documented workaround for affected users: run `REFRESH TABLE 
<view>` or recreate the view after changing the underlying Hive table schema.
   
   The issue currently has no labels; this should probably be routed to the 
Nereids + external catalog/HMS view owners.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to