xxubai opened a new pull request, #4104:
URL: https://github.com/apache/amoro/pull/4104
When a table's optimizer group configuration is changed (e.g., from
"old_group" to "default"), if the AMS restarts before the optimizing process is
properly closed, the table runtime may remain stuck in `PENDING` status
forever. This happens in two scenarios:
- AMS restart with non-existent resource group: During
l`oadOptimizingQueues`, tables whose persisted optimizer group no longer exists
are left in a groupToTableRuntimes map but never released — the previous code
only logged a warning without taking any corrective action.
- Runtime config change to non-existent group: When `handleConfigChanged` is
triggered and the table's new optimizer group does not exist, there is no
fallback to release the table's optimizing process, causing it to hang
indefinitely.
<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://amoro.apache.org/how-to-contribute/
2. If the PR is related to an issue in
https://github.com/apache/amoro/issues, add '[AMORO-XXXX]' in your PR title,
e.g., '[AMORO-XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][AMORO-XXXX] Your PR title ...'.
-->
## Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about its use case.
2. If you fix a bug, you can clarify why it is a bug.
3. Use Fix/Resolve/Close #{ISSUE_NUMBER} to link this PR to its related
issue
-->
Close #4103.
## Brief change log
- **`DefaultOptimizingService.loadOptimizingQueues()`**: Replaced the
original logic that merely logged a warning for unloaded table runtimes in
non-existent groups. Now, tables in `PLANNING` or `PENDING` status whose
resource group does not exist are actively released back to `IDLE` via
`completeEmptyProcess()`.
- **`DefaultOptimizingService.ConfigChangeHandler.handleConfigChanged()`**:
After refreshing the table in the new group's queue, added logic to release the
table from the queue. If the new group's queue does not exist,
`completeEmptyProcess()` is called directly on the table runtime to prevent it
from being stuck.
- **`TestOptimizingQueue`**: Added two new test cases:
- `testReleaseOrphanedPlanningTableOnRestart`: Verifies that a table
persisted with `PLANNING` status in a non-existent group is correctly released
to `IDLE` during AMS restart.
- `testReleaseOrphanedPendingTableOnRestart`: Verifies the same behavior
for tables persisted with `PENDING` status.
- Added helper method `simulateLoadOptimizingQueuesForNonExistentGroup()`
to simulate the `loadOptimizingQueues` logic in tests.
- **`TestDefaultOptimizingService`**: Added two new test cases:
- `testHandleConfigChangedGroupChanged`: Verifies that when the optimizer
group changes to a different **existing** group, the table is properly released
from both old and new queues without exceptions.
- `testHandleConfigChangedGroupNotExist`: Verifies that when the optimizer
group changes to a **non-existent** group, `completeEmptyProcess()` is called
on the table runtime.
## How was this patch tested?
- [x] Add some test cases that check the changes thoroughly including
negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] Run test locally before making a pull request
## Documentation
- Does this pull request introduce a new feature? No
- If yes, how is the feature documented? Not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]