pchang388 commented on issue #12701:
URL: https://github.com/apache/druid/issues/12701#issuecomment-1178331450

   After more digging, we found that "202 Accepted" happen often from "pause" 
HTTP calls but they usually do pause/resume later. This specific issue happens 
when it doesn't actually pause and if we look at the code here: 
https://github.com/apache/druid/blob/0.22.1/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java
   
   We can see that it is supposed to log a debug message when it eventually 
does pause/resume from 202s:
   ```
     private boolean possiblyPause() throws InterruptedException
     {
       pauseLock.lockInterruptibly();
       try {
         if (pauseRequested) {
           status = Status.PAUSED;
           hasPaused.signalAll();
   
           log.debug("Received pause command, pausing ingestion until 
resumed.");
           while (pauseRequested) {
             shouldResume.await();
           }
   
           status = Status.READING;
           shouldResume.signalAll();
           log.debug("Received resume command, resuming ingestion.");
           return true;
         }
       }
       finally {
         pauseLock.unlock();
       }
   
       return false;
     }
   ```
   
   For normal pause with "202 Accepted" response:
   * Overlord sends HTTP request to Peon's /pause endpoint
   * Peon thread handling HTTP request sets pauseRequested to true (line 1793), 
and blocks waiting for its main thread to call signalAll() on the hasPaused 
Condition (line 1814)
   * Peon's main thread calls possiblyPause() repeatedly (line 579), observes 
that pauseRequested is true (line 1308), calls signalAll() on the hasPaused 
Condition (line 1310), and blocks waiting for another thread to call 
signalAll() on the shouldResume Condition (line 1314)
   * Peon thread handling HTTP request unblocks
   * Another thread calls signalAll() on the shouldResume Condition (line 
1318/1433/1804/1845)
   * The main thread unblocks
   
   For cases where the Peon responds with "202 Accepted" but never pauses, we 
never see that debug message, looks something like this:
   * Peon thread handling HTTP request is interrupted after 2-second timeout 
(line 1807, 1814) and returns 202 Accepted
   * Main thread SHOULD still observe that pauseRequested is true on its next 
call to possiblyPause()
     * Either something set pauseRequested to false before then, or something 
weird happened with pauseLock.lockInterruptibly() (line 1306) or 
hasPaused.signalAll() (line 1310)
     * We don't see the follow log (line 1312) after a 202 Accepted response: 
log.debug("Received pause command, pausing ingestion until resumed.");


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to