xxubai opened a new pull request, #4104:
URL: https://github.com/apache/amoro/pull/4104

   When a table's optimizer group configuration is changed (e.g., from 
"old_group" to "default"), if the AMS restarts before the optimizing process is 
properly closed, the table runtime may remain stuck in `PENDING` status 
forever. This happens in two scenarios:
   
   - AMS restart with non-existent resource group: During 
l`oadOptimizingQueues`, tables whose persisted optimizer group no longer exists 
are left in a groupToTableRuntimes map but never released — the previous code 
only logged a warning without taking any corrective action.
   - Runtime config change to non-existent group: When `handleConfigChanged` is 
triggered and the table's new optimizer group does not exist, there is no 
fallback to release the table's optimizing process, causing it to hang 
indefinitely.
   
   <!--
   Thanks for sending a pull request!
   
   Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: 
https://amoro.apache.org/how-to-contribute/
     2. If the PR is related to an issue in 
https://github.com/apache/amoro/issues, add '[AMORO-XXXX]' in your PR title, 
e.g., '[AMORO-XXXX] Your PR title ...'.
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., 
'[WIP][AMORO-XXXX] Your PR title ...'.
   -->
   
   ## Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you add a feature, you can talk about its use case.
     2. If you fix a bug, you can clarify why it is a bug.
     3. Use Fix/Resolve/Close #{ISSUE_NUMBER} to link this PR to its related 
issue
   -->
   
   Close #4103.
   
   ## Brief change log
   
   - **`DefaultOptimizingService.loadOptimizingQueues()`**: Replaced the 
original logic that merely logged a warning for unloaded table runtimes in 
non-existent groups. Now, tables in `PLANNING` or `PENDING` status whose 
resource group does not exist are actively released back to `IDLE` via 
`completeEmptyProcess()`.
   
   - **`DefaultOptimizingService.ConfigChangeHandler.handleConfigChanged()`**: 
After refreshing the table in the new group's queue, added logic to release the 
table from the queue. If the new group's queue does not exist, 
`completeEmptyProcess()` is called directly on the table runtime to prevent it 
from being stuck.
   
   - **`TestOptimizingQueue`**: Added two new test cases:
     - `testReleaseOrphanedPlanningTableOnRestart`: Verifies that a table 
persisted with `PLANNING` status in a non-existent group is correctly released 
to `IDLE` during AMS restart.
     - `testReleaseOrphanedPendingTableOnRestart`: Verifies the same behavior 
for tables persisted with `PENDING` status.
     - Added helper method `simulateLoadOptimizingQueuesForNonExistentGroup()` 
to simulate the `loadOptimizingQueues` logic in tests.
   
   - **`TestDefaultOptimizingService`**: Added two new test cases:
     - `testHandleConfigChangedGroupChanged`: Verifies that when the optimizer 
group changes to a different **existing** group, the table is properly released 
from both old and new queues without exceptions.
     - `testHandleConfigChangedGroupNotExist`: Verifies that when the optimizer 
group changes to a **non-existent** group, `completeEmptyProcess()` is called 
on the table runtime.
   
   ## How was this patch tested?
   
   - [x] Add some test cases that check the changes thoroughly including 
negative and positive cases if possible
   
   - [ ] Add screenshots for manual tests if appropriate
   
   - [x] Run test locally before making a pull request
   
   ## Documentation
   
   - Does this pull request introduce a new feature? No
   - If yes, how is the feature documented? Not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to