[ 
https://issues.apache.org/jira/browse/HDDS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyesh K updated HDDS-14844:
-----------------------------
    Description: 
We have following scenario may happen in recon,

T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called 
once: reads from DB → {{{}objectCountMap = {keyTableCount: 100{}}}}

T2: Normal delta events arrive → {{process()}} called on the singleton → 
{{objectCountMap}} grows: {{

{keyTableCount: 150}

}}

T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal 
reinit

T4: {{reInitializeTasks()}} called → Creates temporary staged task via 
{{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new 
OM snapshot → Writes 160 to staging DB

T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB 
now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB

T6: Temp staged task is garbage collected and removed → Singleton task: 
{{objectCountMap}} still = {{

{keyTableCount: 150}

}} — STALE

T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}} 
calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if 
(tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never 
cleared) → Skips {{{}init(){}}}, keeps using stale maps → 
{{objectCountMap.computeIfPresent: 150 + 1 = 151}}

T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is 
161 (160 from reprocess + 1 new key)

 

To fix this we have to update reconOmTasks with newly created tasks after 
reprocess called or we need to initialize in memory maps

  was:
We have following scenario may happen in recon,



T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called 
once: reads from DB → {{objectCountMap = \{keyTableCount: 100}}}

T2: Normal delta events arrive → {{process()}} called on the singleton → 
{{objectCountMap}} grows: {{{keyTableCount: 150}}}

T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal 
reinit

T4: {{reInitializeTasks()}} called → Creates temporary staged task via 
{{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new 
OM snapshot → Writes 160 to staging DB

T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB 
now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB

T6: Temp staged task is garbage collected and removed → Singleton task: 
{{objectCountMap}} still = {{{keyTableCount: 150}}} — STALE

T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}} 
calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if 
(tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never 
cleared) → Skips {{{}init(){}}}, keeps using stale maps → 
{{objectCountMap.computeIfPresent: 150 + 1 = 151}}

T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is 
161 (160 from reprocess + 1 new key)


 

To fix this we have to update reconOmTasks with newly created tasks after 
reprocess called.


> Update reconOmTasks with newly created tasks after reprocess
> ------------------------------------------------------------
>
>                 Key: HDDS-14844
>                 URL: https://issues.apache.org/jira/browse/HDDS-14844
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Recon
>            Reporter: Priyesh K
>            Assignee: Priyesh K
>            Priority: Major
>              Labels: pull-request-available
>
> We have following scenario may happen in recon,
> T1: Recon starts → OmTableInsightTask singleton created → {{init()}} called 
> once: reads from DB → {{{}objectCountMap = {keyTableCount: 100{}}}}
> T2: Normal delta events arrive → {{process()}} called on the singleton → 
> {{objectCountMap}} grows: {{
> {keyTableCount: 150}
> }}
> T3: OM compaction ({{{}SequenceNumberNotFoundException{}}}) triggers internal 
> reinit
> T4: {{reInitializeTasks()}} called → Creates temporary staged task via 
> {{getStagedTask()}} → Temp task runs {{reprocess()}} → counts 160 keys in new 
> OM snapshot → Writes 160 to staging DB
> T5: Swap succeeds → {{reconDBProvider.replaceStagedDb(...)}} — production DB 
> now has 160 → {{reconGlobalStatsManager.reinitialize(...)}} — points to new DB
> T6: Temp staged task is garbage collected and removed → Singleton task: 
> {{objectCountMap}} still = {{
> {keyTableCount: 150}
> }} — STALE
> T7: New delta event arrives (1 key added in OM) → {{processOMUpdateBatch()}} 
> calls {{task.process(events, ...)}} on singleton → {{process()}} checks: {{if 
> (tables == null || tables.isEmpty())}} → FALSE (tables was set in T1, never 
> cleared) → Skips {{{}init(){}}}, keeps using stale maps → 
> {{objectCountMap.computeIfPresent: 150 + 1 = 151}}
> T8: {{writeDataToDB}} writes 151 to production DB → WRONG! Correct answer is 
> 161 (160 from reprocess + 1 new key)
>  
> To fix this we have to update reconOmTasks with newly created tasks after 
> reprocess called or we need to initialize in memory maps



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to