ddanielr commented on code in PR #4840:
URL: https://github.com/apache/accumulo/pull/4840#discussion_r1733364854


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/session/ScanSession.java:
##########
@@ -184,17 +191,71 @@ public TabletResolver getTabletResolver() {
   }
 
   public ScanTask<T> getScanTask() {
-    return scanTask;
+    return scanTaskRef.get();
   }
 
   public void setScanTask(ScanTask<T> scanTask) {
-    this.scanTask = scanTask;
+    Objects.requireNonNull(scanTask);
+    scanTaskRef.getAndUpdate(currScanTask -> {
+      Preconditions.checkState(currScanTask == null,
+          "Unable to set a scan task when one is already set");
+      return scanTask;
+    });
+  }
+
+  public void clearScanTask() {
+    scanTaskRef.getAndUpdate(currScanTask -> {
+      // For tracking zombie scan threads, do not want to clear the scan task 
if it has an active
+      // thread. When the thread is not null and the task has produced a 
result, the thread should
+      // be in
+      // the process of clearing itself from the scan task.
+      Preconditions.checkState(
+          currScanTask == null || currScanTask.getScanThread() == null
+              || currScanTask.producedResult(),
+          "Can not clear scan task that is still running and has not produced 
a result");
+      return null;
+    });
+  }
+
+  private boolean loggedZombieStackTrace = false;
+
+  public void logZombieStackTrace() {
+    Preconditions.checkState(getState() == State.REMOVED);
+    var scanTask = scanTaskRef.get();
+    if (scanTask != null) {
+      ScanTask.ScanThreadStackTrace scanStackTrace = scanTask.getStackTrace();
+      if (scanStackTrace != null && !loggedZombieStackTrace) {
+        var changeTimeMillis = elaspedSinceStateChange(TimeUnit.MILLISECONDS);
+        var exception =

Review Comment:
   I wasn't sure if the threadID was enough information for troubleshooting so 
I tested appending the thread name to the exception message. 
   
   The only additional information that was shown was the user and the specific 
tablet. 
   
   ```
   2024-08-27T18:14:46,739 [session.ScanSession] WARN : Scan session with no 
client active for 6194ms has a zombie scan thread. Scan session info : 
SingleScanSession REMOVED startTime:1724782473239 lastAccessTime:1724782474239 
client:127.0.0.1:56496 tableId:1 
   java.lang.Exception: Fake exception to capture stack trace of zombie scan.  
Thread id:199 name: User: root Start: 1724782473239 Client: 127.0.0.1:56496 
Tablet: 1<<
   ```
   
   @keith-turner do you think we should also include the specific tablet info 
the zombie scan was attempting? Or is the tableID enough?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to