[
https://issues.apache.org/jira/browse/HDDS-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal resolved HDDS-13648.
----------------------------------
Fix Version/s: 2.1.0
Resolution: Fixed
> Update NSSummary rebuilding implementation to queue based approach
> ------------------------------------------------------------------
>
> Key: HDDS-13648
> URL: https://issues.apache.org/jira/browse/HDDS-13648
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Ozone Recon
> Reporter: Devesh Kumar Singh
> Assignee: Devesh Kumar Singh
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.1.0
>
>
> Currently rebuilding of NSSummary data is based on multiple scenarios and
> being triggered across multiple code paths which might bring inconsistency.
> So to improve that, synchronized based approach was implemented so that one
> thread at a time should be running and building NSSummary, but now after
> staging and queue based implementation, this also should be updated to more
> simpler and consistent approach along with existing tasks executions.
> *And following additional fixes and improvements:*
> # *NSSummaryTask.java:273 Reliability* The reprocess task might terminate
> prematurely if shutdown is slow. executorService.shutdown() is called without
> awaitTermination. Potentially misleading success logs if shutdown is
> interrupted. P3 L Add executorService.awaitTermination with a reasonable
> timeout after shutdown().
> # *NSSummaryTask.java:253 Perf->Reliability* Slower reprocess execution due
> to thread starvation. Fixed-size thread pool (n=2) is smaller than the number
> of parallel tasks (n=3). Increased latency for full namespace reprocessing,
> delaying Recon's data availability.
> # *NSSummaryTask.java:267 (Ignoring InterruptedException)*
> When a thread is interrupted during Future.get(), the InterruptedException is
> caught, but the thread's interrupted status is not restored.
>
> {code:java}
> } catch (InterruptedException | ExecutionException ex) {
> LOG.error("Error while reprocessing NSSummary table in Recon DB.", ex);
> REBUILD_STATE.set(RebuildState.FAILED); return
> buildTaskResult(false); // Missing Thread.currentThread().interrupt();
> }{code}
>
> # {*}NSSummaryTaskDbEventHandler.java:264{*}: On DB write failure, log the
> size of the failed batch.
> {code:java}
> LOG.error("Unable to write Namespace Summary data in Recon DB. batchSize={}",
> nsSummaryMap.size(), e);{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]