[
https://issues.apache.org/jira/browse/HDDS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Priyesh K updated HDDS-14844:
-----------------------------
Description:
We have following scenario may happen in recon,
T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called
once: reads from DB → {{{}objectCountMap = {keyTableCount: 100{}}}}
T2: Normal delta events arrive → {{process()}} called on the singleton →
{{objectCountMap}} grows: {{
{keyTableCount: 150}
}}
T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal
reinit
T4: {{reInitializeTasks()}} called → Creates temporary staged task via
{{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new
OM snapshot → Writes 160 to staging DB
T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB
now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB
T6: Temp staged task is garbage collected and removed → Singleton task:
{{objectCountMap}} still = {{
{keyTableCount: 150}
}} — STALE
T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}}
calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if
(tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never
cleared) → Skips {{{}init(){}}}, keeps using stale maps →
{{objectCountMap.computeIfPresent: 150 + 1 = 151}}
T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is
161 (160 from reprocess + 1 new key)
To fix this we have to update reconOmTasks with newly created tasks after
reprocess called or we need to initialize in memory maps
was:
We have following scenario may happen in recon,
T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called
once: reads from DB → {{objectCountMap = \{keyTableCount: 100}}}
T2: Normal delta events arrive → {{process()}} called on the singleton →
{{objectCountMap}} grows: {{{keyTableCount: 150}}}
T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal
reinit
T4: {{reInitializeTasks()}} called → Creates temporary staged task via
{{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new
OM snapshot → Writes 160 to staging DB
T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB
now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB
T6: Temp staged task is garbage collected and removed → Singleton task:
{{objectCountMap}} still = {{{keyTableCount: 150}}} — STALE
T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}}
calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if
(tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never
cleared) → Skips {{{}init(){}}}, keeps using stale maps →
{{objectCountMap.computeIfPresent: 150 + 1 = 151}}
T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is
161 (160 from reprocess + 1 new key)
To fix this we have to update reconOmTasks with newly created tasks after
reprocess called.
> Update reconOmTasks with newly created tasks after reprocess
> ------------------------------------------------------------
>
> Key: HDDS-14844
> URL: https://issues.apache.org/jira/browse/HDDS-14844
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Recon
> Reporter: Priyesh K
> Assignee: Priyesh K
> Priority: Major
> Labels: pull-request-available
>
> We have following scenario may happen in recon,
> T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called
> once: reads from DB → {{{}objectCountMap = {keyTableCount: 100{}}}}
> T2: Normal delta events arrive → {{process()}} called on the singleton →
> {{objectCountMap}} grows: {{
> {keyTableCount: 150}
> }}
> T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal
> reinit
> T4: {{reInitializeTasks()}} called → Creates temporary staged task via
> {{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new
> OM snapshot → Writes 160 to staging DB
> T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB
> now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB
> T6: Temp staged task is garbage collected and removed → Singleton task:
> {{objectCountMap}} still = {{
> {keyTableCount: 150}
> }} — STALE
> T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}}
> calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if
> (tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never
> cleared) → Skips {{{}init(){}}}, keeps using stale maps →
> {{objectCountMap.computeIfPresent: 150 + 1 = 151}}
> T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is
> 161 (160 from reprocess + 1 new key)
>
> To fix this we have to update reconOmTasks with newly created tasks after
> reprocess called or we need to initialize in memory maps
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]