[ 
https://issues.apache.org/jira/browse/HDDS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-14844:
----------------------------------
    Labels: pull-request-available  (was: )

> Update reconOmTasks with newly created tasks after reprocess
> ------------------------------------------------------------
>
>                 Key: HDDS-14844
>                 URL: https://issues.apache.org/jira/browse/HDDS-14844
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Recon
>            Reporter: Priyesh K
>            Assignee: Priyesh K
>            Priority: Major
>              Labels: pull-request-available
>
> We have following scenario may happen in recon,
> T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called 
> once: reads from DB → {{objectCountMap = \{keyTableCount: 100}}}
> T2: Normal delta events arrive → {{process()}} called on the singleton → 
> {{objectCountMap}} grows: {{{keyTableCount: 150}}}
> T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal 
> reinit
> T4: {{reInitializeTasks()}} called → Creates temporary staged task via 
> {{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new 
> OM snapshot → Writes 160 to staging DB
> T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB 
> now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB
> T6: Temp staged task is garbage collected and removed → Singleton task: 
> {{objectCountMap}} still = {{{keyTableCount: 150}}} — STALE
> T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}} 
> calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if 
> (tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never 
> cleared) → Skips {{{}init(){}}}, keeps using stale maps → 
> {{objectCountMap.computeIfPresent: 150 + 1 = 151}}
> T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is 
> 161 (160 from reprocess + 1 new key)
>  
> To fix this we have to update reconOmTasks with newly created tasks after 
> reprocess called.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to