devmadhuu opened a new pull request, #9322:
URL: https://github.com/apache/ozone/pull/9322

   ## What changes were proposed in this pull request?
   This PR fix is to fix following code reliability issues and data integrity 
improvements.
   
   **Location: ReconTaskControllerImpl.java**
   
   ```
   public synchronized void stop() {
       LOG.info("Stopping Recon Task Controller.");
       if (this.executorService != null) {
           this.executorService.shutdownNow();  // No awaitTermination
       }
       if (this.eventProcessingExecutor != null) {
           this.eventProcessingExecutor.shutdownNow();  // No awaitTermination
       }
   }
   ```
    
   **Impact: Service reliability, data integrity Likelihood: High (every 
service shutdown)**
   
   **Fix:**
   
   `private static final int SHUTDOWN_TIMEOUT_SECONDS = 30;`
    
   ```
   public synchronized void stop() {
       LOG.info("Stopping Recon Task Controller.");
       shutdownExecutorGracefully(this.executorService, "main task executor");
       shutdownExecutorGracefully(this.eventProcessingExecutor, "event 
processing executor");
   }
   ```
    
   ```
   private void shutdownExecutorGracefully(ExecutorService executor, String 
name) {
       if (executor == null) return;
       
       executor.shutdown();
       try {
           if (!executor.awaitTermination(SHUTDOWN_TIMEOUT_SECONDS, 
TimeUnit.SECONDS)) {
               LOG.warn("Executor {} did not terminate within {} seconds, 
forcing shutdown", 
                        name, SHUTDOWN_TIMEOUT_SECONDS);
               executor.shutdownNow();
               if (!executor.awaitTermination(5, TimeUnit.SECONDS)) {
                   LOG.error("Executor {} did not terminate after forced 
shutdown", name);
               }
           }
       } catch (InterruptedException e) {
           LOG.warn("Interrupted while waiting for {} to terminate", name);
           executor.shutdownNow();
           Thread.currentThread().interrupt();
       }
   }
   ```
    
   **Location: OzoneManagerServiceProviderImpl.java:**
   
    `scheduler.shutdownNow();  // No awaitTermination`
    
   **Risk:** Scheduler threads may not terminate, causing resource leaks and 
preventing JVM shutdown Impact: Resource exhaustion, service restart failures
   
    
   **Fix:**
   
   ```
   private void stopSyncDataFromOMThread() {
       scheduler.shutdown();
       try {
           if (!scheduler.awaitTermination(30, TimeUnit.SECONDS)) {
               scheduler.shutdownNow();
               if (!scheduler.awaitTermination(5, TimeUnit.SECONDS)) {
                   LOG.error("OM sync scheduler failed to terminate");
               }
           }
       } catch (InterruptedException e) {
           scheduler.shutdownNow();
           Thread.currentThread().interrupt();
       }
       tarExtractor.stop();
       LOG.debug("Shutdown the OM DB sync scheduler.");
   }
   ```
   
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-13956
   
   ## How was this patch tested?
   This patch is tested with existing junit and integration tests and on local 
docker cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to