Copilot commented on code in PR #18113:
URL: https://github.com/apache/pinot/pull/18113#discussion_r3046198365


##########
pinot-controller/src/main/java/org/apache/pinot/controller/cursors/ResponseStoreCleaner.java:
##########
@@ -114,6 +115,10 @@ private static long getFrequencyInSeconds(ControllerConf 
config) {
 
   @Override
   protected void processTables(List<String> tableNamesWithType, Properties 
periodicTaskProperties) {
+    // Make it so that only one controller is responsible for cleanup.
+    if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
+      return;
+    }

Review Comment:
   `ControllerPeriodicTask.runTask()` only invokes `processTables()` when this 
controller is leader for at least one entry returned by `getTablesToProcess()`. 
With the new `isLeaderForTable(TASK_NAME)` gate inside `processTables()`, it’s 
possible that the controller that is leader for `TASK_NAME` is not leader for 
any actual table (e.g., few tables / many controllers), causing 
`processTables()` to never be called on that controller and the cleaner to 
never run. Consider overriding `getTablesToProcess()` to return a singleton 
list containing `TASK_NAME` (or switching this task to `BasePeriodicTask`) so 
exactly one controller runs cleanup independent of table distribution, and then 
remove/avoid the additional early-return gate.



##########
pinot-common/src/main/java/org/apache/pinot/common/cursors/AbstractResponseStore.java:
##########
@@ -233,9 +236,18 @@ public boolean deleteResponse(String requestId) throws 
Exception {
       return false;
     }
 
-    long bytesWritten = readResponse(requestId).getBytesWritten();
+    // Read bytesWritten for metrics tracking. The response may be deleted 
concurrently between exists() and
+    // readResponse() (TOCTOU race), so handle that gracefully.
+    long bytesWritten = 0;
+    try {
+      bytesWritten = readResponse(requestId).getBytesWritten();
+    } catch (Exception e) {
+      LOGGER.debug("Could not read response metadata for requestId={} (may 
have been deleted concurrently)",

Review Comment:
   The intent here is to handle the TOCTOU “deleted between exists() and 
readResponse()” race, but catching a generic `Exception` can also hide non-race 
failures (e.g., deserialization bugs, permission issues, transient FS errors), 
and it will silently skip metrics adjustments in those cases. Consider 
narrowing the catch to the expected “not found”/IO exceptions (or rethrow 
unexpected exceptions) so real read failures aren’t masked.
   ```suggestion
       // readResponse() (TOCTOU race), so only suppress read failures when the 
response no longer exists.
       long bytesWritten = 0;
       try {
         bytesWritten = readResponse(requestId).getBytesWritten();
       } catch (Exception e) {
         if (exists(requestId)) {
           throw e;
         }
         LOGGER.debug("Could not read response metadata for requestId={} 
because it was deleted concurrently",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to