[ 
https://issues.apache.org/jira/browse/HDDS-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Agrawal resolved HDDS-13648.
----------------------------------
    Fix Version/s: 2.1.0
       Resolution: Fixed

> Update NSSummary rebuilding implementation to queue based approach
> ------------------------------------------------------------------
>
>                 Key: HDDS-13648
>                 URL: https://issues.apache.org/jira/browse/HDDS-13648
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Recon
>            Reporter: Devesh Kumar Singh
>            Assignee: Devesh Kumar Singh
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.0
>
>
> Currently rebuilding of NSSummary data is based on multiple scenarios and 
> being triggered across multiple code paths which might bring inconsistency. 
> So to improve that, synchronized based approach was implemented so that one 
> thread at a time should be running and building NSSummary, but now after 
> staging and queue based implementation, this also should be updated to more 
> simpler and consistent approach along with existing tasks executions.
> *And following additional fixes and improvements:*
>  # *NSSummaryTask.java:273 Reliability* The reprocess task might terminate 
> prematurely if shutdown is slow. executorService.shutdown() is called without 
> awaitTermination. Potentially misleading success logs if shutdown is 
> interrupted. P3 L Add executorService.awaitTermination with a reasonable 
> timeout after shutdown().
>  # *NSSummaryTask.java:253 Perf->Reliability* Slower reprocess execution due 
> to thread starvation. Fixed-size thread pool (n=2) is smaller than the number 
> of parallel tasks (n=3). Increased latency for full namespace reprocessing, 
> delaying Recon's data availability.
>  # *NSSummaryTask.java:267 (Ignoring InterruptedException)*
> When a thread is interrupted during Future.get(), the InterruptedException is 
> caught, but the thread's interrupted status is not restored.
>     
> {code:java}
> } catch (InterruptedException | ExecutionException ex) {        
> LOG.error("Error while reprocessing NSSummary table in Recon DB.", ex);       
>  REBUILD_STATE.set(RebuildState.FAILED);        return 
> buildTaskResult(false);        // Missing Thread.currentThread().interrupt(); 
>      }{code}
>  
>  # {*}NSSummaryTaskDbEventHandler.java:264{*}: On DB write failure, log the 
> size of the failed batch. 
> {code:java}
> LOG.error("Unable to write Namespace Summary data in Recon DB. batchSize={}", 
> nsSummaryMap.size(), e);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to