ArafatKhan2198 opened a new pull request, #9436:
URL: https://github.com/apache/ozone/pull/9436
## What changes were proposed in this pull request?
This PR fixes race conditions that cause Recon to fail during cluster
upgrades when the schema versioning framework adds new database columns.
**Problem:**
During upgrades from older Ozone versions, Recon encounters
`SQLSyntaxErrorException` errors because multiple components attempt to query
the `RECON_TASK_STATUS` table before the upgrade framework has added the new
columns (`last_task_run_status` and `is_current_task_running`).
**Changes:**
1. **ReconTaskStatusUpdaterManager**: Implemented lazy initialization with
column existence checking
- Constructor no longer reads the database during Guice injection
- Added `ensureInitialized()` method that checks if upgrade columns exist
- If columns are missing, queries only base columns and uses default
values
- Allows retry after upgrade completes by not setting `initialized =
true` on failure
2. **ReconServer**: Deferred metrics registration until after schema upgrades
- Moved `reconTaskStatusMetrics.register()` from `start()` to after
`finalizeLayoutFeatures()`
- Ensures metrics system doesn't query incomplete schema
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14079
## How was this patch tested?
## Verification
The fix was validated on an upgrade scenario. All four upgrade handlers
executed successfully without errors:
```
2025-12-03 18:43:35,324 INFO ReconLayoutVersionManager: Current MLV: -1.
SLV: 4. Checking features for registration...
2025-12-03 18:43:35,347 INFO ReconSchemaVersionTableManager: Inserted new
schema version '0'.
2025-12-03 18:43:35,383 INFO ReconLayoutVersionManager: Feature versioned 0
finalized successfully.
2025-12-03 18:43:35,394 INFO ReconSchemaVersionTableManager: Updated schema
version to '1'.
2025-12-03 18:43:35,396 INFO ReconTaskStatusTableUpgradeAction: Adding
'last_task_run_status' column to task status table
2025-12-03 18:43:35,399 INFO ReconTaskStatusTableUpgradeAction: Adding
'is_current_task_running' column to task status table
2025-12-03 18:43:35,408 INFO ReconTaskStatusTableUpgradeAction: Updated 9
rows with default value for new columns
2025-12-03 18:43:35,425 INFO ReconLayoutVersionManager: Feature versioned 1
finalized successfully.
2025-12-03 18:43:35,427 INFO ReconSchemaVersionTableManager: Updated schema
version to '2'.
2025-12-03 18:43:35,441 INFO ReconLayoutVersionManager: Feature versioned 2
finalized successfully.
2025-12-03 18:43:35,442 INFO ReconSchemaVersionTableManager: Updated schema
version to '3'.
2025-12-03 18:43:35,443 INFO NSSummaryAggregatedTotalsUpgrade: Triggering
asynchronous NSSummary tree rebuild for materialized totals (upgrade action).
2025-12-03 18:43:35,485 INFO ReconLayoutVersionManager: Feature versioned 3
finalized successfully.
2025-12-03 18:43:35,487 INFO ReconSchemaVersionTableManager: Updated schema
version to '4'.
2025-12-03 18:43:35,843 INFO ReplicatedSizeOfFilesUpgradeAction: Completed
full rebuild of NSSummary for REPLICATED_SIZE_OF_FILES upgrade.
2025-12-03 18:43:35,813 INFO ReconTaskControllerImpl: Re-initialization of
tasks completed successfully.
```
All schema versions (0 → 4) were applied successfully, including the
critical TASK_STATUS_STATISTICS upgrade (added new columns) and
NSSUMMARY_AGGREGATED_TOTALS upgrade (triggered tree rebuild). No
SQLSyntaxErrorException or NullPointerException errors occurred, confirming the
race conditions are resolved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]