Devesh Kumar Singh created HDDS-11688:
-----------------------------------------
Summary: Ozone Recon - Improve processing reliability of OM DB
events by Recon' background tasks
Key: HDDS-11688
URL: https://issues.apache.org/jira/browse/HDDS-11688
Project: Apache Ozone
Issue Type: Task
Components: Ozone Recon
Reporter: Devesh Kumar Singh
Assignee: Devesh Kumar Singh
When a set of OM DB events being synced periodically and incrementally in
Recon, Recon process those set of events through some tasks to derive some
insights about OM DB data and each task process each OM DB event sequentially,
so it is important to know what all tasks have processed how many events and
how many are still remaining to be processed received out of current OM DB
sequence number Recon has pulled from OM DB. Currently Recon processes all
events per task and if any event gets failed, Recon marks the whole task as
failed and retry (re-run) the task another 2 times with the same set of events
to try to process.
Below are the steps:
1. Task will try to process those incremental set of events.
2. If task fails in step #1, then task is retried with same set of events, if
it succeed, then we all good.
3. But if step #2 fails again with same set of events, then task will run
re-process and run against full records of that respective OM DB table.
4. Now here issue is, if step #3 also some where fails at any point of time,
then currently those set of incremental events synced are ignored and proceed
to wait for next periodic sync of events from OM DB. So need to handle this
edge case more diligently and efficiently to make Recon data more reliable.
Proposed way to handle:
If a task was failed in last run, then in its next run, let task run and
process full OM DB snapshot to bring processed data to normalized state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]